I did my analysis using EdgeR and with DESeq2 with and without lfcShrink. At FDR<10% we have the following:
EdgeR - 17 genes
DESeq2 without LFC shrink - 37
DESeq2 with LFC shrink - 54
Only 10 genes overlap between all the 3 methods at FDR<10%. The fold change for EdgeR and DESeq2 without LFC shrink are almost the same for these 10 genes are high. But when the analysis is repeated with DESeq2withLFCShrink the fold changes for these 10 genes drop to < 2.
Our results show that EdgeR and DESeq2 without lfcShrink show very high fold changes (>10) which is highly unlikely in our study. On the other hand DESeq2 with lfcShrink is giving reasonable fold changes but the number of differentially expressed genes at FDR<5% or 10% is high compared to EdgeR.
Is there an explanation for why the fold changes are inflated for EdgeR and DESeq2 without lfcShrink?
Question 1: Because you have small or zero counts in one or the other of your groups. In the most extreme case, if you have no expression in one group, that's going to be an infinite fold change in the other group.
Question 2: Increase the prior.count in glmFit, which shrinks the log-fold changes towards zero.
I'm bemused about why you think a fold change of 10 is "very high", it's only one order of magnitude. Not small, but not huge either, and definitely achievable in many contexts. Indeed, a 10-fold upregulation isn't hard to achieve in any setting if the gene wasn't highly expressed in the first place. Perhaps one could do more shrinkage to be more conservative, but the true log-fold change could still be anywhere. If you want to make inferences from the log-fold change, test against it explicitly, e.g., with TREAT.
Thanks Steve for your response. The fact that the log fold changes are so different between different programs makes it harder to believe which one is correct.
I tried different prior.count and sometimes there is change in the LFC and sometimes not. What is the best way to chose the prior.count for my particular experiment. Is there a formula or function?
Your concern about the differences in the log-fold changes is misdirected. It is not the fault of the software, it is a consequence of your (lack of) data. In the absence of information to estimate the log-fold change (i.e., because the counts are very small), the reported estimates are necessarily determined by the assumptions made by the authors of each function. Different assumptions will yield different log-fold change estimates, which is to be expected. If you want estimates that are consistent across packages, get more data, i.e., sequence deeper.
From edgeR's perspective, the instability of the log-fold change is not a big deal as the p-value should be the primary statistic of interest for inference and will naturally capture the effects of small counts. The log-fold change is simply a helpful descriptive statistic, of which the most interesting aspect is the sign. You response might be to say that you care about the magnitude; but if you are interested in log-fold changes above a certain threshold, you should be explicitly testing against that threshold using methods like glmTreat. This is similarly insensitive to choices of prior.count or shrinkage as it operates directly on the counts.
Anyway, to answer your actual question, the choice of prior.count reflects how strongly log-fold changes should be stabilized. Some genes will not be affected as their counts are large, but some genes will be strongly affected as their counts are small. I typically use a prior.count=3 for other applications, which ensures that genes with large log-fold changes need to have large counts to back it up. I suppose we could empirically define a "better" choice by using information from other genes (a la DESeq2), but edgeR doesn't use the log-fold change to compute p-values, so it wouldn't make much difference to interpretation.
Thanks Steve for your response. The fact that the log fold changes are so different between different programs makes it harder to believe which one is correct.
I tried different prior.count and sometimes there is change in the LFC and sometimes not. What is the best way to chose the prior.count for my particular experiment. Is there a formula or function?
Your concern about the differences in the log-fold changes is misdirected. It is not the fault of the software, it is a consequence of your (lack of) data. In the absence of information to estimate the log-fold change (i.e., because the counts are very small), the reported estimates are necessarily determined by the assumptions made by the authors of each function. Different assumptions will yield different log-fold change estimates, which is to be expected. If you want estimates that are consistent across packages, get more data, i.e., sequence deeper.
From edgeR's perspective, the instability of the log-fold change is not a big deal as the p-value should be the primary statistic of interest for inference and will naturally capture the effects of small counts. The log-fold change is simply a helpful descriptive statistic, of which the most interesting aspect is the sign. You response might be to say that you care about the magnitude; but if you are interested in log-fold changes above a certain threshold, you should be explicitly testing against that threshold using methods like
glmTreat
. This is similarly insensitive to choices ofprior.count
or shrinkage as it operates directly on the counts.Anyway, to answer your actual question, the choice of
prior.count
reflects how strongly log-fold changes should be stabilized. Some genes will not be affected as their counts are large, but some genes will be strongly affected as their counts are small. I typically use aprior.count=3
for other applications, which ensures that genes with large log-fold changes need to have large counts to back it up. I suppose we could empirically define a "better" choice by using information from other genes (a la DESeq2), but edgeR doesn't use the log-fold change to compute p-values, so it wouldn't make much difference to interpretation.Also, I'm not Steve.
Did someone call?
Sorry Aaron for calling you Steve. Sincerely appreciate your response.
Thanks,
Nirmala