We have performed differential expression analyses using DESeq2 to directly compare the same tissue in two species that diverged from a common ancestor about 50 million years ago. We added a line of code that takes into account gene length differences between species. Since DESeq2 was not initially intended for such analyses, we wonder about some general considerations -
Is there an assumption in DESeq2 that most gene expression is not different in the two samples? Can this be overcome if the mean and variance of rlog or vst transformed gene counts are similar?
Since these gene counts are continuous values, how 'similar' is similar?
Thanks in advance!!
Thanks for that reply @Michael Love! Here is the MA plot for my interspecies comparisons. This plot is for all the log2 fold changes in my comparisons.
Oh, can you change ylim to see the extent of the LFC better? Also use MLE=TRUE
Hi @Michael Love,
Thanks for the suggestion. I did-
dds <- DESeq(ddsHTSeq, betaPrior = TRUE)
res <- results(dds, addMLE = TRUE)
plotMA(res,ylim = c(-21,21),main = 'DESeq2', MLE = TRUE)*
This is the resulting MA plot with a wider LFC scale and MLE=True.
(For comparison here is the MLE=FALSE MA plot)
Thanks in advance!
To me this looks like the median ratio is effective.
Note that
betaPrior=TRUE
is provided for backwards compatibility but is no longer recommended (since v1.16), e.g. see vignette code examples for recommended LFC shrinkage.Thank you so much, Michael! I want to understand this a bit better and have few naive followups:
The median ratio you refer to is LFC (M) : Mean of Normalised Counts (A) (and not the median of LFC) ?
You think the median ratio in my interspecies analysis is effective for identifying differentially expressed genes because most genes cluster around the central M=0 line? What would MA plots look like if the median ratios were not effective?
Thanks in advance!
The median ratio is the method used for normalization, consult the original DESeq publication (2010).
One case when the median ratio is clearly not appropriate is when there are different modes of the LFC distribution.
But so long as you understand that the median ratio method is correcting for the predominant mode of the LFC distribution, then it is effective for that purpose.