Hi guys,
This is my first time posting here. I just found something weird in my analysis with DESEQ2 and I need to understand what is going on. In my dataset, I have samples from 8 early developmental stages, 5-6 biological replicates each and they all come from the same 3 mothers. I chose to use ~Mother + Stage + Mother:Stage as design
> dds = DESeq(DataSet)
> O1 = results(dds, name = "Stage_Zygote_vs_Oocyte", alpha=0.05, lfcThreshold = 1)
When I subset the significant genes, I find something weird:
> subset(O1, padj < 0.05)
log2 fold change (MLE): Stage Zygote vs Oocyte
Wald test p-value: Stage Zygote vs Oocyte
DataFrame with 70 rows and 6 columns
baseMean log2FoldChange lfcSE stat
<numeric> <numeric> <numeric> <numeric>
ABCA12 59.4956699489524 -14.9133792319661 3.71445376343021 -3.74574032094491
ADGB 51.0680755933681 17.5751109155461 3.97882126140021 4.16583450891509
ADGRE1 33.6984046101437 17.2937693617856 3.56971529878028 4.56444506018532
ASB4 31.2331914086339 -19.1622710101915 4.55455002644081 -3.9877201709835
CACNA2D3 40.4185283435237 17.7988173432936 3.86616299527236 4.34508771715927
... ... ... ... ...
TEX48 24.89320830301 17.1771442220675 3.60314025728128 4.48973480545935
TLR3 26.9055004142896 20.0886568896418 4.87113135074353 3.91873171039169
TTC6 50.6948729394596 -25.014599018666 3.57747682275414 -6.71271966485538
TULP1 53.3587235245435 -18.3209993664313 3.24473232475249 -5.33819052939986
ZBBX 89.9977078303872 11.9611695011492 2.7692506967838 3.95817161439355
pvalue padj
<numeric> <numeric>
ABCA12 0.000179862570107177 0.0473325296812482
ADGB 3.1021568526327e-05 0.0122454269848053
ADGRE1 5.00817328386006e-06 0.00284182532776034
ASB4 6.67112839074731e-05 0.0208852326412396
CACNA2D3 1.39220019753226e-05 0.00648194132994635
... ... ...
TEX48 7.13119000472494e-06 0.00379525854995077
TLR3 8.9016130133389e-05 0.0257838109177583
TTC6 1.91029908672885e-11 8.6718027042056e-08
TULP1 9.38787778213941e-08 0.000124883311959608
ZBBX 7.55256866464884e-05 0.0232439901377447
The first significant gene, ABCA12, has a log2FC of -15, but when I look at the counts, all samples from the oocyte stage and the zygote stage have 0. How is this possible?
> counts = counts(dds)
> counts[rownames(counts)=="ABCA12",]
16cell-D1 16cell-D2 16cell-J1 16cell-S1 16cell-S2 Zygote-D1
0 0 61 0 96 0
Zygote-D2 Zygote-J1 Zygote-J2 Zygote-S1 Zygote-S2 2cell-D1
0 0 0 0 0 0
2cell-D2 2cell-J1 2cell-J2 2cell-S1 2cell-S2 4cell-D1
0 0 2 0 0 0
4cell-J1 4cell-J2 4cell-S1 4cell-S2 8cell-D2 8cell-J1
0 27 28 42 27 11
8cell-J2 8cell-S1 8cell-S2 Blastocyst-D1 Blastocyst-D2 Blastocyst-J1
88 3 7 0 110 33
Blastocyst-J2 Blastocyst-S1 Blastocyst-S2 Morula-D2 Morula-J1 Morula-J2
113 107 58 22 28 18
Morula-S1 Morula-S2 Oocyte-D1 Oocyte-D2 Oocyte-J1 Oocyte-J2
33 0 0 0 0 0
Oocyte-S2
0
Next is my session info
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.2 apeglm_1.6.0 GenomicFeatures_1.36.0
[4] AnnotationDbi_1.46.0 ggplot2_3.1.1 plyr_1.8.4
[7] stringr_1.4.0 rtracklayer_1.44.0 DESeq2_1.24.0
[10] SummarizedExperiment_1.14.0 DelayedArray_0.10.0 BiocParallel_1.17.18
[13] matrixStats_0.54.0 Biobase_2.44.0 GenomicRanges_1.36.0
[16] GenomeInfoDb_1.20.0 IRanges_2.18.0 S4Vectors_0.22.0
[19] BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] httr_1.4.0 bit64_0.9-7 splines_3.6.0
[4] assertthat_0.2.1 Formula_1.2-3 latticeExtra_0.6-28
[7] blob_1.1.1 GenomeInfoDbData_1.2.1 Rsamtools_2.0.0
[10] progress_1.2.0 numDeriv_2016.8-1 pillar_1.3.1
[13] RSQLite_2.1.1 backports_1.1.4 lattice_0.20-38
[16] bbmle_1.0.20 digest_0.6.18 RColorBrewer_1.1-2
[19] XVector_0.24.0 checkmate_1.9.3 colorspace_1.4-1
[22] htmltools_0.3.6 Matrix_1.2-17 XML_3.98-1.19
[25] pkgconfig_2.0.2 emdbook_1.3.11 biomaRt_2.40.0
[28] genefilter_1.66.0 zlibbioc_1.30.0 xtable_1.8-4
[31] snow_0.4-3 scales_1.0.0 htmlTable_1.13.1
[34] tibble_2.1.1 annotate_1.62.0 withr_2.1.2
[37] nnet_7.3-12 lazyeval_0.2.2 survival_2.44-1.1
[40] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
[43] MASS_7.3-51.4 foreign_0.8-71 prettyunits_1.0.2
[46] tools_3.6.0 hms_0.4.2 munsell_0.5.0
[49] locfit_1.5-9.1 cluster_2.0.9 Biostrings_2.52.0
[52] compiler_3.6.0 rlang_0.3.4 grid_3.6.0
[55] RCurl_1.95-4.12 rstudioapi_0.10 htmlwidgets_1.3
[58] bitops_1.0-6 base64enc_0.1-3 gtable_0.3.0
[61] DBI_1.0.0 R6_2.4.0 GenomicAlignments_1.20.0
[64] gridExtra_2.3 knitr_1.22 bit_1.1-14
[67] Hmisc_4.2-0 stringi_1.4.3 Rcpp_1.0.1
[70] geneplotter_1.62.0 rpart_4.1-15 acepack_1.4.1
[73] coda_0.19-2 xfun_0.6
If you need more information to help, please let me know. Thanks a lot!
Yeah, I've run lfcShrink with apelgm a couple times before posting here and I've seen that it effectively gets rid of them, but I was still curious. Ok, I'll stick to lfcShrink then.
Thanks a lot, Michael. Your tutorials are very clear and the attention you give to this forum is really helpful