I have puzzling results from an RNA-Seq experiment. The experimental setup is 4 conditions: two tissues (cortex and striatum), and two treatments (drug and DMSO), each is in triplicate (12 samples total). I have set up my design model at the "group" level with no intercept, so that I could directly specify the desired contrast without releveling. Looking at the effect of the drug in the cortex I see several DEGs, but none in the striatum. This would be fine, but the strange thing happens when I remove one of the cortex replicates. PCA of all the samples shows one of the cortex samples which clusters with the striatum replicates. This could have been a mistake by the researchers dissecting or mis-labeling a tube or something, so I excluded it. I expected that this would affect the DEGs coming from the cortex, but what I saw was that it affected the results from the striatum contrast! I now have DEGs in the striatum, even though I did not alter any of the striatum samples. Is there a hidden interaction occurring between the samples? Have I set up my design matrix incorrectly? Any insight into this strange phenomenon would be greatly appreciated!
> cond <- as.factor(paste0(gse$Tissue,"_",gse$Treatment))
> cond<- relevel(cond, "Cortex_DMSO")
> gse$group <- cond
> gse$group
[1] Cortex_DMSO Cortex_DMSO Cortex_DMSO Cortex_Rumitin Cortex_Rumitin Cortex_Rumitin Striatum_DMSO Striatum_DMSO
[9] Striatum_DMSO Striatum_Rumitin Striatum_Rumitin Striatum_Rumitin
Levels: Cortex_DMSO Cortex_Rumitin Striatum_DMSO Striatum_Rumitin
> dds <- DESeqDataSet(gse, design = ~ 0 + group)
> dds <- DESeq(dds)
> res.cort<- lfcShrink(dds, contrast=c("group","Cortex_Rumitin","Cortex_DMSO"), type="ashr")
> res.cort
log2 fold change (MMSE): group Cortex_Rumitin vs Cortex_DMSO
Wald test p-value: group Cortex_Rumitin vs Cortex_DMSO
DataFrame with 19004 rows and 5 columns
baseMean log2FoldChange lfcSE pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric>
Gnai3 1038.88144 -0.000482174 0.00969773 0.1743862 0.971595
Cdc45 65.64955 -0.003120309 0.04345455 0.0793545 0.893919
H19 2.73165 -0.000422471 0.05318124 0.8731500 1.000000
Scml2 44.48048 -0.000260775 0.02449895 0.8204180 1.000000
Apoh 4.78234 -0.007466221 0.10939134 0.1343830 0.950330
... ... ... ... ... ...
Hmga2-ps1 31.16452 0.000383322 0.0206632 0.67033426 1.000000
Gm35162 1.82351 0.000515772 0.0653898 0.86600591 1.000000
Gm54112 14.84016 -0.003030118 0.0720300 0.36330174 1.000000
Klhl17 1483.64391 0.000523043 0.0112509 0.18522629 0.972951
Rn7sk 74.60347 -0.058693122 0.2358592 0.00488479 0.357740
> summary(res.cort)
out of 19004 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 22, 0.12%
LFC < 0 (down) : 41, 0.22%
outliers [1] : 36, 0.19%
low counts [2] : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
----------
> res.stri<- lfcShrink(dds, contrast=c("group","Striatum_Rumitin","Striatum_DMSO"),type="ashr")
> res.stri
log2 fold change (MMSE): group Striatum_Rumitin vs Striatum_DMSO
Wald test p-value: group Striatum_Rumitin vs Striatum_DMSO
DataFrame with 19004 rows and 5 columns
baseMean log2FoldChange lfcSE pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric>
Gnai3 1038.88144 1.38462e-04 0.0114670 0.893608 0.996519
Cdc45 65.64955 7.10672e-05 0.0223572 0.967872 0.998015
H19 2.73165 1.82210e-04 0.0336336 0.845834 0.996519
Scml2 44.48048 9.96938e-04 0.0305389 0.567218 0.996519
Apoh 4.78234 -9.52567e-04 0.0351848 0.274633 0.976907
... ... ... ... ... ...
Hmga2-ps1 31.16452 -0.000393574 0.0272692 0.826926 0.996519
Gm35162 1.82351 0.000249077 0.0338485 0.777089 0.996519
Gm54112 14.84016 0.000579603 0.0336583 0.584960 0.996519
Klhl17 1483.64391 -0.000304130 0.0134503 0.798946 0.996519
Rn7sk 74.60347 0.003086375 0.0352105 0.204464 0.957125
> summary(res.stri)
out of 19004 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 0, 0%
LFC < 0 (down) : 0, 0%
outliers [1] : 36, 0.19%
low counts [2] : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
----------
![]PCA of all samples with outlier Cortex sample highlighted[1]
![PCA without Cortex outlier. Note Striatum samples are unchanged.][1]
----------
- -RE-RUN IDENTICALLY AS BEFORE, JUST WITHOUT THE CORTEX OUTLIER - -
res.stri<- lfcShrink(dds, contrast=c("Group","Striatum_Rumitin","Striatum_DMSO"),type="ashr")
> res.stri
log2 fold change (MMSE): Group Striatum_Rumitin vs Striatum_DMSO
Wald test p-value: Group Striatum_Rumitin vs Striatum_DMSO
DataFrame with 18995 rows and 5 columns
baseMean log2FoldChange lfcSE pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric>
Gnai3 1037.96827 0.000650442 0.0248222 0.894532 0.990115
Cdc45 66.30511 0.000279675 0.0501738 0.974007 0.995281
H19 2.62274 0.001234327 0.0884534 0.840425 NA
Scml2 43.06311 0.005715537 0.0687505 0.535760 0.943023
Apoh 4.88169 -0.006701980 0.0988151 0.280998 NA
... ... ... ... ... ...
Hmga2-ps1 31.17408 -0.00189327 0.0617784 0.830524 0.984211
Gm35162 1.89299 0.00158597 0.0896914 0.788511 NA
Gm54112 14.46826 0.00372433 0.0876773 0.590337 0.956754
Klhl17 1476.69182 -0.00149574 0.0307298 0.804329 0.983584
Rn7sk 78.46717 0.01496945 0.0778847 0.203525 0.833373
> summary(res.stri)
out of 18995 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 51, 0.27%
LFC < 0 (down) : 37, 0.19%**
outliers [1] : 32, 0.17%
low counts [2] : 2210, 12%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
----------
sessionInfo( )
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.54.4 PhosR_1.13.1 vsn_3.70.0 gridExtra_2.3 clusterProfiler_4.10.1
[6] eulerr_7.0.2 ggvenn_0.1.10 scales_1.3.0 stringr_1.5.1 pathview_1.42.0
[11] biomaRt_2.58.2 enrichplot_1.22.0 tibble_3.2.1 tidyr_1.3.1 ggrepel_0.9.5
[16] ReportingTools_2.42.3 RUVSeq_1.36.0 edgeR_4.0.16 limma_3.58.1 EDASeq_2.36.0
[21] ShortRead_1.60.0 GenomicAlignments_1.38.2 Rsamtools_2.18.0 Biostrings_2.70.3 XVector_0.42.0
[26] sva_3.50.0 BiocParallel_1.36.0 mgcv_1.9-1 nlme_3.1-164 Gviz_1.46.1
[31] org.Mm.eg.db_3.18.0 AnnotationDbi_1.64.1 genefilter_1.84.0 ashr_2.2-63 apeglm_1.24.0
[36] ggbeeswarm_0.7.2 PoiClaClu_1.0.2.1 glmpca_0.2.0 RColorBrewer_1.1-3 hexbin_1.28.3
[41] pheatmap_1.0.12 ggplot2_3.5.1 dplyr_1.1.4 DESeq2_1.42.1 SummarizedExperiment_1.32.0
[46] Biobase_2.62.0 MatrixGenerics_1.14.0 matrixStats_1.3.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[51] IRanges_2.36.0 S4Vectors_0.40.2 magrittr_2.0.3 tximeta_1.20.3 BiocFileCache_2.10.2
[56] dbplyr_2.5.0 rmarkdown_2.26 knitr_1.46 readr_2.1.5 BiocGenerics_0.48.1
[61] BiocStyle_2.30.0 BiocManager_1.30.23
The images I tried to attach didn't attach. Here is the PCA (MDS) of all the samples, with the Cortex outlier highlighed.
And this is the Volcano plot of the Striatum samples when nothing is removed:
After the outlier cortex sample is removed, the 6 striatum samples are unchanged in the PCA/MDS plot:
But now the striatum is showing DEGs even though it was a cortex sample which was removed.