Hi,
Currently playing with shrinkage parameters for an interaction of sex on my RNA-seq data. I've moved forward initially with apeglm
, based on Zhu et al., 2018 but am curious as to what criteria others use to justify apeglm
vs ashr
for this sort of analysis.
My current dataset is 160 samples, but I'm narrowing focus to 4M + 4F per experimental condition in it's smallest unit. As an example, I have an FDR < 0.05 for ~ 200 genes, but apeglm
shrinks all my fold changes towards zero, and ashr
gives LFCs between 1 and 28 for all 200 genes. Base means are fairly low, but the surface difference reads as 'no sex interaction' with apeglm
vs 'some interaction' through ashr
, with the list including a number of sex-linked genes.
dds <- DESeqDataSetFromMatrix(countData = toc, colData = colData, design = ~Time_Cond*Sex)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds$Sex <- relevel(dds$Sex,"F")
dds$Time_Cond <- relevel(dds$Time_Cond,"4W_Cont")
dds <- nbinomWaldTest(dds, betaPrior=FALSE)
resultsNames(dds)
[1] "Intercept" "Time_Cond_3D_Cont_vs_4W_Cont" "Time_Cond_3D_Ipsi_vs_4W_Cont" "Time_Cond_4W_Ipsi_vs_4W_Cont"
[5] "Sex_M_vs_F" "Time_Cond3D_Cont.SexM" "Time_Cond3D_Ipsi.SexM" "Time_Cond4W_Ipsi.SexM"
res <- results(dds, name="Time_Cond4W_Ipsi.SexM", cooksCutoff=TRUE, filterFun=ihw)
results_apeglm <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="apeglm")
results_ashr <- lfcShrink(dds, coef="Time_Cond4W_Ipsi.SexM", res=res, type="ashr")
### annotated DEGs (FDR < 0.05)
> head(results_apeglm_all)
$sig_res
log2 fold change (MAP): Time Cond4W Ipsi.SexM
Wald test p-value: Time Cond4W Ipsi.SexM
DataFrame with 212 rows and 6 columns
baseMean log2FoldChange lfcSE pvalue padj symbol
<numeric> <numeric> <numeric> <numeric> <numeric> <character>
ENSMUSG00000005237 19.8251 -2.55126e-06 0.0014427 2.60194e-04 0.02807446 Dnah2
ENSMUSG00000092384 14.3061 2.53761e-06 0.0014427 3.09149e-05 0.00175162 NA
ENSMUSG00000024056 11.9862 -2.18011e-06 0.0014427 2.46566e-06 0.00030016 Ndc80
ENSMUSG00000053749 115.3782 2.14070e-06 0.0014427 2.82711e-05 0.00993435 Gm9920
ENSMUSG00000043913 90.3787 1.94794e-06 0.0014427 3.46138e-05 0.00881880 Ccdc60
head(results_ashr_all)
$sig_res
log2 fold change (MMSE): Time Cond4W Ipsi.SexM
Wald test p-value: Time Cond4W Ipsi.SexM
DataFrame with 212 rows and 6 columns
baseMean log2FoldChange lfcSE pvalue padj symbol
<numeric> <numeric> <numeric> <numeric> <numeric> <character>
ENSMUSG00000115151 5.92225 29.0170 2.70542 1.08325e-27 4.01186e-24 NA
ENSMUSG00000040985 6.34427 28.8085 2.97286 4.68893e-23 1.39236e-19 Sun3
ENSMUSG00000104266 2.78937 -28.7404 3.06162 8.74850e-22 5.02753e-18 NA
ENSMUSG00000117896 6.09515 -28.7338 2.95905 3.89762e-23 1.00295e-19 AA387883
ENSMUSG00000110747 12.41885 -28.7184 2.35732 5.72117e-35 6.81046e-31 NA
Hi Michael,
Plots attached. Many of the large fold changes with
ashr
are driven by one sample per group, but a few are maybe a bit more convincing. I've included a spread of examples.MA plots
apeglm
(left),ashr
(right)I see why apeglm is conservative— the dispersion is very high within group.
An idea would be to use ashr but do some additional filtering so all genes must have 5 or more positive counts. Otherwise it’s hard to say what’s due to noise and what’s DE
Thanks, I'll trying playing a bit there. I have done some minor filtering already (requiring at least 5 counts in 10% of my samples, across all 160 samples), because I was running into convergence issues with my glm for a small number of genes (~10 genes). There's definitely space to filter within subgroups a bit more though.