Hi
I'm currently working on RNA-seq analysis, and my goal is to get differentially expressed genes using DESeq2.
I have 2 groups, A and B, and each group contains 6 samples. After using DESeq2, I got 1405 DEGs in total. Code as following:
dds <- DESeq(dds)
res = results(dds, cooksCutoff=TRUE,
contrast=c("condition", "group A", "group B"),
independentFiltering=TRUE)
However, when I tried to use one single sample from group A against group B (6 samples), the number of DEGs is very small:
sample 1 from group A vs group B: 11 DEGs
sample 2 from group A vs group B: 245 DEGs
sample 3 from group A vs group B: 35 DEGs
sample 4 from group A vs group B: 21 DEGs
sample 5 from group A vs group B: 7 DEGs
sample 6 from group A vs group B: 20 DEGs
Although using one single sample from group A against group B (6 samples) will lead to different number of DEGs comparing to group A(6 samples) vs group B (6 samples), but the numbers I got is too few and I believe something went wrong.
The possible reason I guess is that using one single sample, the estimated size factor and dispersion are changed, this will lead to another different fitted regression model. But this still cannot explain why the number of DEGs is small.
Any help would be appreciated, thanks in advance!
Hi Michael,
Thank you for your reply. When you say the degrees of freedom reduces from 10 to 5, is this part of the reason that cause the hypothesis testing more conservative and underestimate the statistics results so that only few DEGs can be detected? Also when you say lose all information within group variability, do you mean that by only one sample, the dispersion cannot be estimated?
Thank you
This is why you lose power, yes. Power related to sample size.
You are also losing information about within-group variance from one group. Dispersion can still be estimated from the other group, but it’s not a good idea to drop samples.
Thank you for the help. Sorry for keep asking silly questions: I wonder given 2 groups (A and B), and each group has 6 samples, how does the Log2FC calculated in DESeq2? Does it take the average of raw count within each group then take the logrithm of the fraction?
See the 2014 paper, the MLE LFC is found via IRLS:
"We use the standard iteratively reweighted least-squares algorithm [12] for each gene’s model, Equations (1) and (2), to get MLEs for the coefficients beta_ir^MLE..."
It's essentially the same that you would get with
glm.nb
in R with dispersion pre-estimated.