Hi,
I'm using edgeR for differential analysis between tutor and normal tissue cases.
I would like to select differential expressed genes based on log foldchange >= 2 and FDR < 0.05
Based on edgeR f1000 Research paper I used tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(2))
which gives differential expressed genes with FC above 2 at FDR < 0.05
.
is.de <- decideTestsDGE(tr) summary(is.de) -1*Normal 1*TNBC Down 1018 NotSig 7182 Up 1535
Among those 1535 Upregulated genes I also see genes with fold change 1 and also 1.5 which are with FDR < 0.05. Don't know why it gives genes with FC 1 and 1.5 also in the results.
I need differential expressed genes with FC >= 2 & FDR < 0.05 cutoff. May I know how to get this.
Thank you
Yes, I know that. I found the way to get differential expressed genes with logFC |2| and fDR < 0.05
If you want the log fold change to be > |2|, then don't take logs for the lfc argument! Just use lfc = 2.
I don't think you fully understand James' point. If you're only interested in genes with a log-fold change (significantly) greater than 2, why are you using
glmTreat
withlfc=1
? You should uselfc=2
instead. Also see the Note at the bottom of the documentation in?topTable
regarding post-hoc filtering on the log-fold change.oh yes, I understand now. log2(2) is nothing but 1. So, instead I will be using like this.
Am I right here?
No. You should be using
lfc=2
inglmTreat
. DE genes should only be selected on the basis of their (adjusted) p-values. Again, I suggest you read the Note that I referred to above.Sorry I was wrong may be. I saw one of your comment just now Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off in which you answered in the way I did. Possibly it is because of treatDGE() function may be. So, I misunderstood from that.
So then to get differential expressed genes with | logFC | > 2 and FDR < 0.05 I will use following steps.
Is this right now?
Yes, that is correct.
Thanks a lot Aaron !!
Hi Aaron,
Small doubt. In many Research Papers I have seen people mentioning about fold change >=2 and fold change >=5 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927085/ [ Check Results section ]. My doubt is what should be the minimum fold change cutoff for finding differentially expressed genes? Because in edgeR paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4934518/pdf/f1000research-5-9996.pdf [Page 14] they have used
lfc = log2(1.5)
which will be fold change 0.58. Does this cutoff a right one to find differential expressed genes? Can I select DEGs withlfc = 1.5
? And how can I give lfc greater than equal to 1.5? Is this rightlfc >= 1.5
Stop.
Take a deep breath.
Now, remember the difference between a fold-change and a log-fold change. When we set
lfc=log2(1.5)
, we are testing against a fold change of 1.5 and a log-fold change of ~0.58.If you were to set
lfc=1.5
, you would be testing against a fold change of ~2.8. However, this is likely to be too stringent to be useful; you would fail to detect a lot of relevant DE genes with log-fold changes around 1 (i.e., fold-changes of 2).Indeed, even
lfc=1
is likely to be too stringent. Remember, we are testing againstlfc
, so a gene with a log-fold change of 1 will not be significantly different fromlfc=1
. This is why the paper useslfc=log2(1.5)
, which allows genes with log-fold changes around unity to be detected.In your code above, you use
lfc=2
, which on hindsight is likely to be far too high. If you're wondering what threshold to use, read Yunshun's comments C: Obtaining Differentially expressed gene lists at a Fold change and FDR cut-off. In short, pick the log-fold change below which you are definitely not interested in the changes.I don't know what your last few questions were referring to, but (i) there is no way to get a one-sided p-value from
glmTreat
, and (ii) puttinglfc >= 1.5
in yourglmTreat
call doesn't make any syntactic sense.