Hi,
I have a question, I am working with mRNA-Seq dataset (18 samples) corresponding to different 2 batches and 3 cell types . Batch_1
has 2 cell types (A and B types), however, Batch_2
has 3rd cell type (C type). I imported the dataset via DESeq2
in R., but for some reason (as mentioned below regarding the “the model matrix is not full rank", I cannot perform differential expression and to measure the effect of the Cell_Type, controlling for batch differences.
I would like to export either normalised_counts (OR) vst or rlog, and import it to limma
for differential expression using limma-trend
. Can we use rlog or vst data values in limma for differential expression analysis?
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ Batch+Cell_Type)
dds
“the model matrix is not full rank, so the model cannot be fit as specified.” There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples. We address these two problems below and discuss possible solutions:
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ Cell_Type)
dds
dds <- DESeq(dds)
res <- results(dds)
res
normalized_counts <- counts(dds, normalized=TRUE)
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
Thank you,
Sabiha
Hi Gordon Smyth thank you for the prompt response.
I received this data from the collaborators. Is this something to re-run a couple of samples of cell types (A and B with C)? Have you seen this scenario in other experimental setup, like how one handles this type of data?
From PCA, it looks indeed there's a batch affects, clearly segregation of 2 batches. Additionally, running batch effect library like
combat
also doesn't fix the issues before importing inlimma
?This is a very well known type of problem. If you showed your data.frame of
Batch
andCell_Type
values (i.e., yourcoldata
) then someone would be able give you advice.Running comBat does not address the issue and doesn't help.
Gordon Smyth thank you, this is noted.
Here I am displaying my sample_metadata data.frame.
Created on 2022-12-02 with [reprex v2.0.2](https://reprex.tidyverse.org)
Batch 2 is completely confounded with Cell_Type C, so it is completely impossible to correct for batches. Indeed it is impossible even to judge whether there is a batch effect, because the effect of Batch 2 can't be separated from the effect of Cell_Type C.
Another issue is that your experiment is paired by subject, but your analysis so far seems to be ignoring the pairing.
Given the data you have, it is only possible to compare A vs B, which would be done using a paired comparison. You cannot do any analysis with C.
Gordon Smyth thank you. Let me connect the genomics core lab.
In the meantime, I found something related to this Group-specific condition effects, individuals nested within groups. I am not sure, if this will fix the existing issue.
In the above link it says, We have two groups of samples X and Y, each with three distinct individuals (labeled here 1-6). For each individual, we have conditions A and B (for example, this could be control and treated). Assuming my dataset,
groups
refers tobatch
,individual
assubjects
, andconditions
ascell types
. Here only confusion I have is, it says, three distinct individuals, however, I have six individuals.I tried doing something like;
Gordon Smyth Perhaps shall I need to create a new post for this question?
No, the link you give does not discuss your experimental design and does not address your confounding issue.
I have already given you a complete answer. The limma User's Guide tells you how to analysis paired comparisons. I have already told you that there is no software or statistical solution to the confounding problem in your data.