Hello
I used DESeq2 to perform differential expression between three groups (condition1, condition2 and condition3). Samples also come from three different countries.
PCA (using rlog transformed data) showed a similar pattern of clustering using country and condition (please see link) So I controlled for country as well (~ country + condition).
The MA plot showed two significant genes extremely affected by the condition, but upon inspection of the counts with plotCounts(), I see that there are only a few samples driving the difference. For one gene is even just one sample (please see link).
I checked other DEGs not as extreme and the plot of counts seemed much more reasonable.
Somehow outliers are being allowed to affect the coefficient estimation, but I am not sure why is that. This is how I ran the computations:
des <- model.matrix(~ 0+CountryCode + condition, data = colData(dds)) dds2 <- DESeq(dds, full = des, betaPrior = FALSE) # dds is a deseq data set with design ~1 dds2_cond3_vs_cond1 <- results(dds2, name = "conditionCond3", alpha = 0.05) dds2_cond2_vs_cond1 <- results(dds2, name = "conditionCond2", alpha = 0.05) dds2_cond2_vs_cond3 <-results(dds2, alpha = 0.05, contrast = list("conditionCond2", "conditionCond3"))
I would like ask for guidance on how to deal with those extremely regulated DEGs, and also if the data has been modeled properly, considering the highly overlapping effects of country and and condition. I want to make sure I am not in a "garbage in garbage out" type situation.
Thank you in advance!
Trying to run
lfcShrink
for the coefficient cond2 vs cond1, usinglfcShrink(dds2, coef=4)
(first 3 coefficients are for the countries) gives"Error: subscript contains invalid names"
. I tried changing thecoef
variable to the name of the coefficient, and also using thecontrast
argument with both, the number or the name of the coeficient, and the same error occurs.I'm sorry, I gave bad advice. You will have to use ~countrycode+condition in your DESeq() call, because we cannot perform LFC shrinkage without an intercept term. It's strange because when I run this on a small example using the current release, I get an error that mentions the need for an intercept. What version of DESeq2 are you using?
Hi Michael,
the version is DESeq2_1.18.1
I did the changes you mentioned and LFC shrinkage worked.
Thanks again!
I plotted the shrinked fold changes against the mean and the extreme gene is now shown significant but almost not-regulated (please see figure in link).
Sorry but I did not get the part of subsetting after lfc shrinking by
abs(log2FoldChange) > 0.5.
Did you mean to remove the subset having larger log2fc than 0.5 after shrinking, and keeping the remaining results?LFC shrinkage doesn’t by default change the p-values (although it is optional to switch to a posterior probability based analysis using lfcShrink). I’m suggesting you to remove small LFC using subset (for v1.18). You can also use a padj filter with subset to make the final gene set, by adding “& padj < 0.05”