Hi, I am getting some suspiciously looking "patterns" in my Log2fold change results form the DEseq2 pipeline and it makes me question the validity of the results.
The data is microbiome community, i.e. 16S amplicon data. ca. 10,000 observations (ASVs) across ca. 20 samples.
It is similar to the issue posted here: https://github.com/joey711/phyloseq/issues/1306, but the answer there seem to be using the lfcShrink() however for me, that does not solve anything, rather the results become much more strange....
It can best be illustrated by plots , but it seems it is not possible to add anything graphic to this support site, why I made an identical post on Github including the graphs, please see here : https://github.com/joey711/phyloseq/issues/1424
I would like to use a different normalization, e.g. for the rest of my data analysis I use CSS, see here: http://www.metagenomics.wiki/tools/16s/norm/css, but I can't figure out how to use a normalized data frame as input to the DESeq (it asks always for this size factor estimate which it performs in the deseq normalization method). However I am also not convinced that another normalization method would solve this problem.
What might be the issue?
Here are the main code and the session info:
dsobj <- phyloseq_to_deseq2(GRBH, ~site) ### Exchange object KA or W etc.
deseq_c <- estimateSizeFactors(dsobj, type = "poscounts") # Normalize
dds<- DESeq(deseq_c,fitType = "mean") # Apply the Deseq algorithm
DESeq2::plotMA(resLFC, cex= 1)
DESeq2::plotMA(res) # without shrinking
sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] genefilter_1.72.0 plyr_1.8.6 metagenomeSeq_1.32.0 RColorBrewer_1.1-2
[5] glmnet_4.0-2 Matrix_1.2-18 limma_3.46.0 ggrepel_0.8.2
[9] dplyr_1.0.2 funrar_1.4.1 qdapTools_1.3.5 dendextend_1.14.0
[13] ggplot2_3.3.2 vegan_2.5-6 lattice_0.20-41 permute_0.9-5
[17] reshape_0.8.8 viridis_0.5.1 viridisLite_0.3.0 tidyr_1.1.2
[21] DESeq2_1.30.0 SummarizedExperiment_1.20.0 MatrixGenerics_1.2.0 matrixStats_0.57.0
[25] GenomicRanges_1.42.0 GenomeInfoDb_1.26.0 IRanges_2.24.0 S4Vectors_0.28.0
[29] vsn_3.58.0 Biobase_2.50.0 BiocGenerics_0.36.0 phyloseq_1.34.0
[33] apeglm_1.12.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-0 ellipsis_0.3.1 XVector_0.30.0 rstudioapi_0.13
[5] farver_2.0.3 affyio_1.60.0 bit64_4.0.5 AnnotationDbi_1.52.0
[9] mvtnorm_1.1-1 codetools_0.2-16 splines_4.0.3 geneplotter_1.68.0
[13] ade4_1.7-16 jsonlite_1.7.1 annotate_1.68.0 cluster_2.1.0
[17] BiocManager_1.30.10 compiler_4.0.3 httr_1.4.2 prettyunits_1.1.1
[21] tools_4.0.3 igraph_1.2.6 coda_0.19-4 gtable_0.3.0
[25] glue_1.4.2 GenomeInfoDbData_1.2.4 affy_1.68.0 reshape2_1.4.4
[29] Rcpp_1.0.5 bbmle_1.0.23.1 vctrs_0.3.4 Biostrings_2.58.0
[33] rhdf5filters_1.2.0 multtest_2.46.0 ape_5.4-1 preprocessCore_1.52.0
[37] nlme_3.1-149 iterators_1.0.13 stringr_1.4.0 lifecycle_0.2.0
[41] gtools_3.8.2 XML_3.99-0.5 zlibbioc_1.36.0 MASS_7.3-53
[45] scales_1.1.1 hms_0.5.3 biomformat_1.18.0 rhdf5_2.34.0
[49] memoise_1.1.0 gridExtra_2.3 emdbook_1.3.12 bdsmatrix_1.3-4
[53] stringi_1.5.3 RSQLite_2.2.1 foreach_1.5.1 caTools_1.18.0
[57] BiocParallel_1.24.1 shape_1.4.5 chron_2.3-56 rlang_0.4.8
[61] pkgconfig_2.0.3 bitops_1.0-6 Wrench_1.8.0 purrr_0.3.4
[65] Rhdf5lib_1.12.0 labeling_0.4.2 bit_4.0.4 tidyselect_1.1.0
[69] magrittr_2.0.1 R6_2.5.0 gplots_3.1.1 generics_0.1.0
[73] DelayedArray_0.16.0 DBI_1.1.0 pillar_1.4.7 withr_2.3.0
[77] mgcv_1.8-33 survival_3.2-7 RCurl_1.98-1.2 tibble_3.0.4
[81] crayon_1.3.4 KernSmooth_2.23-17 progress_1.2.2 locfit_1.5-9.4
[85] grid_4.0.3 data.table_1.13.2 blob_1.2.1 digest_0.6.27
[89] xtable_1.8-4 numDeriv_2016.8-1.1 munsell_0.5.0
Okay, I was of the impression that it was a rather widely used method for this type of data... but perhaps it is not applicable then. FYI I changed the
fitType
to "local" as this was clearly a better fit (seen fromplotDispEsts()
), however the MA-plots are unchanged by this.I'd recommend using a method that can deal with the sparsity of counts better.