Hi there, I am perfoming DE between different age groups, working on lncRNAs as well as mRNAs. As an initial step, I separated my countdata for lncRNAs and performed DE using DESeq2. From my understanding is that alpha argument should be set to the same FDR that is going to be used so I set it to 0.05
contrast1 <- c("AGE", "30_39yrs" , "20_29yrs" )
res_l_2to1 <- results(dsl, contrast = contrast1, alpha = 0.05)
summary (res_l_2to1)
out of 16624 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 7, 0.042%
LFC < 0 (down) : 7, 0.042%
outliers [1] : 0, 0%
low counts [2] : 6441, 39%
(mean count < 1)
lfc <- 1
res_l_2to1_f <- results(dsl, contrast = contrast1, alpha = 0.05, lfcThreshold = lfc)
summary (res_l_2to1_f)
out of 16624 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up) : 0, 0%
LFC < -1.00 (down) : 0, 0%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)
Using results function I retrieved DE genes then further filtered by log2FC > |1|. However reading from other questions and answers by the developers of this package, it appears that lfcThreshold parameter should be incorporated within the results function. With the first method (post-hoc filtration), I had 879 genes with p values < 0.05 but only 14 genes with adj-p < 0.05. However, when I did incorporate lfcThreshold, I had 14 genes with p values < 0.05 but all adj-p values were equal to 1 I feel it doesn't make sense? Did I make a mistake by performing DE on only the subset of lncRNAs? I wanted to perform each gene type separately (mRNAs and lncRNAs) as the latter has usually low expression compared to mRNA and it would be hard to combine and perform analyses etc. Thanks in advance
Hi Michael, thank you for your reply.
I read the paper in depth to ensure I am using this package correctly. But I am having really stringent results when adding those parameters. I did another comparison between extreme age groups "60-69 yrs" vs "20-29 yrs", without specifying threshold, I had approx. 800+ lncRNAs however utilising lfcThreshold = 1, I now have 6 only!
May I clarify few things with you, if that's ok: -With alpha, I understand that it needs to be same value as FDR, if FDR is going to be different than 0.1 (which in my case 0.05). What I want to know that if I don't change alpha but when receiving lists of DE genes I filter by 0.05 adj-p ... is that still statistically correct?
-With lfcThreshold, in DESeq2 paper it says that for small-scale experiments we don't need to specify a threshold (so can utilise zero LFC). How do you define small experiment? less than 50 samples?
-Finally, I am performing DE on protein coding genes separately than non coding genes as the latter is usually lowly expressed, so I have 2 countdata files. From my understanding, shrinkage of LFC is useful for genes with low read count, however since I already split my data, I don't have to use shrinkage right?
Thank you again!
Setting
alpha
inresults()
just optimizes the independent filtering. So just to be clear, you setindependentFilter=FALSE
, it has no effect. As it is an optimization, using the alpha you will use for FDR thresholding helps sensitivity. But it's not a large effect usually.In small-scale experiments, there is only power to discover large effects, so most investigators ignore the question of what is a minimum effect size of interest. E.g. imagine an experiment with 2 or 3 replicates per condition.
You don't have to use shrinkage at all, and yes, it is more important for visualizing LFC across a range of count. For large count genes (unless highly dispersed within condition), lfcShrink has little effect.
Thank you Michael.
I appreciate your response!
I now feel more confident knowing that I have chose the right parameters for my analysis :)