Hello,
I have run into an issue with differential abundance analysis in DESeq2 using a Phyloseq object. I am re-running analysis using code that worked perfectly previously. A similar question has been asked but did not address my issue specifically DESeq2 contrast with list no longer works: all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
> packageVersion("DESeq2")
[1] ‘1.21.16’
Previous code:
desfile = phyloseq_to_deseq2(ps.arc.solid, ~Sex + Location + Group) ## "Group" is a 5-level factor of greatest interest biologically.
dds.ps = DESeq(desfile, fitType = "parametric", test = "Wald")
res = resultsdds.ps, cooksCutoff = FALSE)
## Contrast 1: D7 vs D14
res <- resultsdds.ps, contrast = c("Group", "D07", "D14") )
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, :
as D07 is the reference level, was expecting GroupD14 to be present in 'resultsNames(object)'
The above syntax worked fine in my previous version of DESeq2.
Even if I try it as a list:
> res <- resultsdds.ps, contrast = list(c("Group", "D14", "D21") ))
Error in checkContrast(contrast, resNames) :
all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
I have tried a couple of things suggested elsewhere: setting betaPrior = TRUE resulted in the following:
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in averagePriorsOverLevels(objectNZ, betaPriorVar) :
beta prior for Sex2,Location2,Group5 is not greater than 0
Finally I did manage to get my contrast of interest using the following, as suggested by Michael Love in the post linked above:
desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
deseq2 = DESeq(desfile, fitType = "parametric", test = "Wald")
## Contrast 1: D7 V D14
res <- results(deseq2, contrast = c("Group", "D07", "D14") )
res
My question: is setting up the model in this way an admissible method of testing for "Group" while controlling for variance in the other, less interesting factors ("Location", "Sex")??
Thanks a lot!
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 ape_5.1 dunn.test_1.3.5 microbiomeSeq_0.1
[5] scales_1.0.0 vegan_2.5-2 lattice_0.20-35 permute_0.9-4
[9] phyloseq_1.25.2 DESeq2_1.21.16 SummarizedExperiment_1.11.6 DelayedArray_0.7.30
[13] BiocParallel_1.15.8 matrixStats_0.54.0 Biobase_2.41.2 GenomicRanges_1.33.13
[17] GenomeInfoDb_1.17.1 IRanges_2.15.16 S4Vectors_0.19.19 BiocGenerics_0.27.1
[21] RColorBrewer_1.1-2 fgsea_1.6.0 Rcpp_0.12.18 forcats_0.3.0
[25] stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1
[29] tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
[33] biomaRt_2.36.1 edgeR_3.23.3 limma_3.37.3
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 htmlTable_1.12 XVector_0.21.3 base64enc_0.1-3 rstudioapi_0.7
[6] bit64_0.9-7 AnnotationDbi_1.43.1 lubridate_1.7.4 xml2_1.2.0 codetools_0.2-15
[11] splines_3.5.1 geneplotter_1.59.0 knitr_1.20 ade4_1.7-11 Formula_1.2-3
[16] jsonlite_1.5 broom_0.5.0 annotate_1.59.1 cluster_2.0.7-1 compiler_3.5.1
[21] httr_1.3.1 backports_1.1.2 assertthat_0.2.0 Matrix_1.2-14 lazyeval_0.2.1
[26] cli_1.0.0 acepack_1.4.1 htmltools_0.3.6 prettyunits_1.0.2 tools_3.5.1
[31] igraph_1.2.2 gtable_0.2.0 glue_1.3.0 GenomeInfoDbData_1.1.0 reshape2_1.4.3
[36] fastmatch_1.1-0 cellranger_1.1.0 Biostrings_2.49.1 multtest_2.37.0 nlme_3.1-137
[41] iterators_1.0.10 rvest_0.3.2 XML_3.98-1.16 MASS_7.3-50 zlibbioc_1.27.0
[46] hms_0.4.2 biomformat_1.9.0 rhdf5_2.25.4 yaml_2.2.0 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-13 latticeExtra_0.6-28 stringi_1.1.7 RSQLite_2.1.1
[56] genefilter_1.63.0 foreach_1.4.4 checkmate_1.8.5 rlang_0.2.2 pkgconfig_2.0.2
[61] bitops_1.0-6 Rhdf5lib_1.3.1 bindr_0.1.1 htmlwidgets_1.2 labeling_0.3
[66] bit_1.1-14 tidyselect_0.2.4 plyr_1.8.4 magrittr_1.5 R6_2.2.2
[71] snow_0.4-2 Hmisc_4.1-1 DBI_1.0.0 mgcv_1.8-24 pillar_1.3.0
[76] haven_1.1.2 foreign_0.8-71 withr_2.1.2 survival_2.42-6 RCurl_1.95-4.11
[81] nnet_7.3-12 modelr_0.1.2 crayon_1.3.4 progress_1.2.0 locfit_1.5-9.1
[86] readxl_1.1.0 data.table_1.11.4 blob_1.1.1 digest_0.6.16 xtable_1.8-2
[91] munsell_0.5.0
Sorry the forum won't let me post this as a reply to your previous answer.
Yes there seems to be some issue at this point. Maybe there is an issue with the phyloseq interface? Previously resultsNames(dds) would give me an output equal to your dummy data.
But if I build the model like this:
desfile = phyloseq_to_deseq2(ps.arc.solid, ~0 + Group + Location + Sex)
dds= DESeq(desfile, fitType = "parametric", test = "Wald")
resultsNames(dds)
[1] "GroupD07" "GroupD14" "GroupD21" "GroupD28" "GroupD96" "Location1" "Sex1"
Then I can extract:
> res <- results(dds, contrast = c("Group", "D14", "D07") )
> res
log2 fold change (MLE): Group D14 vs D07
Wald test p-value: Group D14 vs D07
DataFrame with 49 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
OTU1 0 NA NA NA NA NA
OTU2 0.899557612849987 0 5.36571059331752 0 1 1
OTU3 0.635054940735835 0 5.35548755640969 0 1 1
OTU4 0.246926462768473 0.340940960161698 5.35649040124304 0.0636500646174179 0.949248866730002 1
OTU5 0.789282734898351 0 5.36537112477623 0 1 1
.
Is this correct? In previous versions, and I thought also in this version, you should put your factor of greatest interest last? Then if I wanted to check for the effect of other factors I could just build a new model with them in the desired order? is this still the case?
Essentially I want to first check for the Age (Group) effect sequentially, while controlling for Location and Sex. Then I want to check for the effect of Location within each age (I made a new DESeq object containing samples from each age group only for this). I think maybe there has been some small change in the syntax of how to construct the model that I have missed? thanks for your help!
Order of variables in the design is not important when you call results() with an argument.
I can’t reproduce what you’re seeing with a minimal example so can’t help much, unless you can provide an example using something like the code I sent.
It's difficult to do in that fashion in this case as the data and variables are locked within the phyloseq object.
So if I can results with the argument, and get the desired comparison at the top of the output file
e.g. log2 fold change (MLE): Group D14 vs D07
Wald test p-value: Group D14 vs D07
then this should have considered the other variables apart from Group that were incorporated in the design?
Yes. The other coefficients (as seen in resultsNames) are also fit.
Ok thanks. I think this is likely a disconnect between Phyloseq and DESeq2 resulting from package updates. I'll go ahead with this method. thanks a lot for your help!