Hi,
I am trying to detect differentially expressed genes in a RNA-seq dataset affect by batch effect due to different isolation protocols used in the wet lab during dataset generation.
Below, a snippet of the code I used to model my data:
rna_dds <- DESeqDataSet(rna_se, design = ~ RNA.Isolation.method + Type)
rna_dds <- DESeq(rna_dds, test = "LRT",
fitType = "local",
full = design(rna_dds)
reduced = ~ RNA.Isolation.method,
parallel = F,
quiet = T)
rna_dds <- rna_dds[!rowRanges(rna_dds)$allZero,]
I can get results
as follow:
rna_LTR <- results(rna_dds, alpha = 1e-4)
rna_batch <- results(rna_dds,
contrast = c("RNA.Isolation.method", "P1", "P2"),
test = "Wald",
lfcThreshold = .5,
altHypothesis = "lessAbs",
alpha = 0.05)
rna_trt1 <- results(rna_dds,
contrast = c("Type", "trt1", "control"),
alpha = 1e-4,
test = "Wald",
lfcThreshold = 2)
rna_trt2 <- results(rna_dds,
contrast = c("Type", "trt2", "control"),
alpha = 1e-4,
test = "Wald",
lfcThreshold = 2)
If I understood correctly, now rna_LTR
will contain the pvalues for differences in Type
between the two models ~ RNA.Isolation.method + Type vs ~ RNA.Isolation.method
, those should take into account the batch effect due to the extraction protocol. rna_batch
on the other hand should contain pvalues for genes that are weakly affected by the extraction protocol (because I used an altHypothesis = lessAbs
). rna_trt1
should contain pvalues (and log2FoldChanges
) for the DEG between trt1 and control. rna_trt2
should contain pvalues (and log2FoldChanges
) for the DEG between trt2 and control.
My question is, is it safe to use pvalues from rna_LTR
and log2FoldChanges
from rna_trt1
and rna_trt2
to detect differentially expressed genes? Should I take into account pvalues also from rna_batch
? If yes, how? e.g.
res <- data.frame(
trt1vctrl = rna_trt1$log2FoldChange,
trt2vctrl = rna_trt2$log2FoldChange,
LTR_padj = rna_LTR$padj,
row.names = rownames(rna_dds))
res <- res[!apply(res, 1, function (r) any( is.na( r))), ]
rna_res[ rna_res$LTR_padj < 1e-4 &
abs(rna_res$trt1vctrl) > 2 &
abs(rna_res$trt2vctrl) > 2, ]
Thanks for your answer! :)
Hi Michael, Thanks for your answer! I generated that object (
rna_batch
) because I was interested in visualizing the genes affected by the batch effect and being able to report a p-value for the difference. I was looking for genes with consistent expression across batches.I am aware that, when using LRT, p-values refer to the test between more than two variable levels, and that's why I used it instead of the standard Wald test. I am looking for genes that behave differently across sample types albeit the RNA isolation protocol used. So if I interpreted your answer correctly, filtering for LRT p-value and pairwise LFC will return what I am looking for. Right?
Also, from your previous answer a question arise, what's the difference between using "contrast" and "name"? I have been always thinking that they would extract the same things.
rna_LTR_trt1 <- results(rna_dds, alpha = 1e-4, contrast = c("Type", "trt1", "ctrl")) # this will return LFC for trt1 vs ctrl and pvalues for LRT across the three sample type
rna_LTR_trt1 <- results(rna_dds, alpha = 1e-4, name = "Type_trt1_vs_ctrl")) # wouldn't this return the same?
Yes, you are right those two will return the same. I wasn't sure why you were generating the additional Wald test p-values but I suppose you want to do something with those. I typically would just consider the set of genes which are significant by the LRT, and then work with the LFCs of the two comparisons directly.