Hi everyone,
While I am trying to understand or ask myself if I should use lfcShrink
with my data as well as trying to find answers to my questions through this forum, I still cannot figure it out what we should do if we have a variable considered as "Batch" and we want to use lfcshrink
.
In the vignette of DESeq2 appears that the shrinkage estimator apeglm
cannot handle contrasts but the design can be rearranged and nbinomWaldTest can be used.
(taken from the vignette) Although apeglm cannot be used with contrast, we note that many designs can be easily rearranged such that what was a contrast becomes its own coefficient. In this case, the dispersion does not have to be estimated again, as the designs are equivalent, up to the meaning of the coefficients. Instead, one need only run nbinomWaldTest to re-estimate MLE coefficients - these are necessary for apeglm - and then run lfcShrink specifying the coefficient of interest in resultsNames(dds). We give some examples below of producing equivalent designs for use with coef. We show how the coefficients change with model.matrix, but the user would, for example, either change the levels of dds$condition or replace the design using design(dds)<-, then run nbinomWaldTest followed by lfcShrink.
However, what if apart from the condition, you also have a batch effect in your samples that you want to include in the design?
You cannot just relevel your design. I want to model the batch effect but the condition of interest is still the same (and I still want to have it at the end of the formula).
In order to give you an example, I have added an extra column to the data from the pasilla
package.
library(pasilla)
pasCts <- system.file("extdata","pasilla_gene_counts.tsv",
package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)
coldata$batch <- factor(c("a", "b", "c", "a", "b", "b", "c"))
rownames(coldata) <- sub("fb", "", rownames(coldata))
cts <- cts[, rownames(coldata)]
coldata
condition type batch
treated1 treated single-read a
treated2 treated paired-end b
treated3 treated paired-end c
untreated1 untreated single-read a
untreated2 untreated single-read b
untreated3 untreated paired-end b
untreated4 untreated paired-end c
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,colData = coldata,
design = ~ batch + condition)
dds
keep <- rowSums(counts(dds)) >= 10
dds_cleaned <- dds[keep,]
dds_cleaned <- DESeq(dds_cleaned)
resultsNames(dds_cleaned)
[1] "Intercept" "batch_b_vs_a"
[3] "batch_c_vs_a" "condition_untreated_vs_treated
results_dds <- results(dds_cleaned)
results_withShrinkage <- lfcShrink(dds_cleaned, coef ="condition_untreated_vs_treated",
type="apeglm", res=results_dds)
As you can see above, resultsNames
has 4 results, 2 from the batch effect and the last one from the condition of interest.
If I run lfcShrink
as I wrote above (taking the condition as my coeficient), will the batch effect be taken into account?
If not, how can I take care of it? Or maybe using lfcShrink
when you have batch effects is not possible?
The only post that I saw that takes into account batch effects + condition is this one, but the type of shrinkage is "normal" and after reading the literature between the different methods, I think that apeglm
is better for my data.
Any help will be really appreciated.
Thanks very much in advance!
Hi, thanks for your reply! Yes, I understand that the batch correction has been "done" before lfcShrink. My question was more related to the coeficient that I have to select for lfcshrink, as it is splitted, I am worried that if I select "condition_untreated_vs_treated", the log2FC obtained will not take into account the batch that I added in the model because it is using just the differences between the condition.
As I've said above, batch has been taken into account and what you have to do now is to select the coefficients you want to test.
Okay, got it. Thanks very much for your help!