Hello all,
I am conducting differential abundance analysis on some metagenomic data. In comparing the results of edgeR and DESeq2, I find that there are drastically more differentially abundant tags found by DESeq2 compared to edgeR (n=59 vs. 17, both FDR < 0.05). Further, only 10 out of 17 of edgeR's significant tags overlap with those found by DESeq2. Does anyone know why such a discrepancy would occur?
(Update 12.14.17 -- I am using the same filter for both DESeq2 and edgeR, and now there are 0 overlaps).
Below is the code I am using (updated 12.14.17, should be reproducible now). Thank you in advance for any help!
-Audrey
suppressPackageStartupMessages({ library(curatedMetagenomicData) library(edgeR) library(DESeq2) }) #data setup loman.counts <- curatedMetagenomicData("LomanNJ_2013.metaphlan_bugs_list.stool", counts = TRUE, dryrun = FALSE)[[1]] loman.counts_na.omit <- loman.counts[,!is.na(loman.counts$stool_texture)] loman.counts$stool_texture <- relevel(factor(loman.counts$stool_texture), "smooth") ###---DESeq2---### dds <- DESeqDataSetFromMatrix(countData = exprs(loman.counts_na.omit), colData = pData(loman.counts_na.omit), design= ~ stool_texture) #filter out features with less than one count dds <- dds[ rowSums(counts(dds)) > 1, ] #fit the model dds <- DESeq(dds, betaPrior = TRUE) #filter by FDR < 0.05 results_DESeq2 <- subset( results(dds, contrast = c("stool_texture", "watery","smooth"), alpha=0.05), padj < 0.05) ###---edgeR---### dge <- DGEList(counts=exprs(loman.counts_na.omit ), group = loman.counts_na.omit$stool_texture) #filter out features with less than 1 count dge <- dge[ rowSums(dge$counts) > 1 , ] #fit the model dsg.mtrx <- model.matrix(~stool_texture, data=loman.counts) fit <- glmLRT( glmFit( estimateDisp(dge, design = dsg.mtrx), dsg.mtrx), coef=3) #filter by FDR < 0.05 results_edgeR <- subset(topTags(fit, n=nrow(fit$table))$table, FDR < 0.05) #how many overlap? c(n_deseq2 = nrow(results_DESeq2), n_edgeR = nrow(results_edgeR), n_overlap = length(base::intersect(rownames(results_DESeq2), rownames(results_edgeR$table))))
Output of sessionInfo():
R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.18.1 SummarizedExperiment_1.8.0 DelayedArray_0.4.1 [4] matrixStats_0.52.2 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 [7] IRanges_2.12.0 S4Vectors_0.16.0 edgeR_3.20.1 [10] limma_3.34.3 curatedMetagenomicData_1.8.0 bindrcpp_0.2 [13] ExperimentHub_1.4.0 AnnotationHub_2.10.1 Biobase_2.38.0 [16] BiocGenerics_0.24.0 dplyr_0.7.4 loaded via a namespace (and not attached): [1] httr_1.3.1 tidyr_0.7.2 bit64_0.9-7 [4] splines_3.4.1 Formula_1.2-2 shiny_1.0.5 [7] assertthat_0.2.0 interactiveDisplayBase_1.16.0 latticeExtra_0.6-28 [10] blob_1.1.0 GenomeInfoDbData_0.99.1 yaml_2.1.16 [13] backports_1.1.1 RSQLite_2.0 lattice_0.20-35 [16] glue_1.2.0 digest_0.6.12 checkmate_1.8.5 [19] RColorBrewer_1.1-2 XVector_0.18.0 colorspace_1.3-2 [22] htmltools_0.3.6 httpuv_1.3.5 Matrix_1.2-12 [25] plyr_1.8.4 XML_3.98-1.9 pkgconfig_2.0.1 [28] genefilter_1.60.0 zlibbioc_1.24.0 purrr_0.2.4 [31] xtable_1.8-2 scales_0.5.0 BiocParallel_1.12.0 [34] annotate_1.56.1 htmlTable_1.11.0 tibble_1.3.4 [37] ggplot2_2.2.1 nnet_7.3-12 lazyeval_0.2.1 [40] survival_2.41-3 magrittr_1.5 mime_0.5 [43] memoise_1.1.0 foreign_0.8-69 BiocInstaller_1.28.0 [46] data.table_1.10.4-3 tools_3.4.1 stringr_1.2.0 [49] munsell_0.4.3 locfit_1.5-9.1 cluster_2.0.6 [52] AnnotationDbi_1.40.0 compiler_3.4.1 rlang_0.1.4 [55] grid_3.4.1 RCurl_1.95-4.8 rstudioapi_0.7 [58] htmlwidgets_0.9 bitops_1.0-6 base64enc_0.1-3 [61] gtable_0.2.0 curl_3.1 DBI_0.7 [64] R6_2.2.2 gridExtra_2.3 knitr_1.17 [67] bit_1.1-12 bindr_0.1 Hmisc_4.0-3 [70] stringi_1.1.6 Rcpp_0.12.14 geneplotter_1.56.0 [73] rpart_4.1-11 acepack_1.4.1
Welcome to Bioconductor support, as I see this is your first post.
Please note that your code isn't reproducible. The code makes reference to quantities that haven't been defined, including 'counts' and 'lrt_watery', so attempts to run it will give errors.
Also, please give output from sessionInfo(), as asked for in the posting guide at http://www.bioconductor.org/help/support/posting-guide
You haven't made the filters the same with your update. You've actually made them more different than in your original post, because you're still running DESeq with independent filtering turned on, while now not doing something equivalent for edgeR. Anyway, you should run each package as recommended by its authors, not try to make them artificially the same in an ad hoc manner, and filtering isn't the main issue here.