Hello,
I'm running DESeq2 in combination with phyloseq to analyze 16S rRNA data coming from a large survey. I have gut microbiome coming from 6 subjects (called 5001, 5002, 5003, 5004, 5005, and 5006) that experimented 3 types of diets (called A, B, and C). I would like to test differences between:
- Different diet in each subject
- Different subject in normal diet (reference level)
- Interaction terms (differences on how subjects respond to different diets)
My design matrix is pretty huge so I'll not post it here but these are the first steps that I ran to get my DESeq2 object:
dds <- phyloseq_to_deseq2(mars, design = ~ food + subject_id + food:subject_id) dds <- DESeq(dds, test = "Wald", fitType = "parametric") matrix(resultsNames(dds))
[,1]
[1,] "Intercept"
[2,] "food_B_vs_A"
[3,] "food_C_vs_A"
[4,] "subject_id_5002_vs_5001"
[5,] "subject_id_5003_vs_5001"
[6,] "subject_id_5004_vs_5001"
[7,] "subject_id_5005_vs_5001"
[8,] "subject_id_5006_vs_5001"
[9,] "foodB.subject_id5002"
[10,] "foodC.subject_id5002"
[11,] "foodB.subject_id5003"
[12,] "foodC.subject_id5003"
[13,] "foodB.subject_id5004"
[14,] "foodC.subject_id5004"
[15,] "foodB.subject_id5005"
[16,] "foodC.subject_id5005"
[17,] "foodB.subject_id5006"
[18,] "foodC.subject_id5006"
If I have understood correctly, I can get the difference across diets for subject 1 simply setting the contrast term correctly:
results(dds, contrast = c("food", "B", "A")) results(dds, contrast = c("food", "C", "A")) results(dds, contrast = c("food", "C", "B"))
But now I have a doubt, how can I get the same results for subject 2? I have done the following steps but I don't know if they are correct...
results(dds, contrast = list(c( "food_B_vs_A", "foodB.subject_id5002" )) results(dds, contrast = list(c( "food_C_vs_A", "foodC.subject_id5002" ))
The contrast for B - C in subject 2 should be: ("food_B_vs_A" - "food_C_vs_A") + ("foodB.subject_id5002" - "foodC.subject_id5002"). So I used a vector of combinations in the contrast argument of the results function:
cbind(as.matrix(resultsNames(dds)), c(0,1,-1,0,0,0,0,0,1,-1,0,0,0,0,0,0,0,0))
[,1] [,2]
[1,] "Intercept" "0"
[2,] "food_B_vs_A" "1"
[3,] "food_C_vs_A" "-1"
[4,] "subject_id_5002_vs_5001" "0"
[5,] "subject_id_5003_vs_5001" "0"
[6,] "subject_id_5004_vs_5001" "0"
[7,] "subject_id_5005_vs_5001" "0"
[8,] "subject_id_5006_vs_5001" "0"
[9,] "foodB.subject_id5002" "1"
[10,] "foodC.subject_id5002" "-1"
[11,] "foodB.subject_id5003" "0"
[12,] "foodC.subject_id5003" "0"
[13,] "foodB.subject_id5004" "0"
[14,] "foodC.subject_id5004" "0"
[15,] "foodB.subject_id5005" "0"
[16,] "foodC.subject_id5005" "0"
[17,] "foodB.subject_id5006" "0"
[18,] "foodC.subject_id5006" "0"
results(dds, contrast = c(0,1,-1,0,0,0,0,0,1,-1,0,0,0,0,0,0,0,0))
Is this correct or I missed something? If this is the right way to do it I was planning to repeat the same procedure for the other subjects and then put everything together into a single data frame.
For point 2, I extracted every contrast this way:
results(dds, contrast = c("subject_id", "5002", "5001") results(dds, contrast = c("subject_id", "5003", "5001") ... results(dds, contrast = c("subject_id", "5004", "5006") results(dds, contrast = c("subject_id", "5005", "5006")
whether for point 3 I'm stuck again... I think that I should extract all interaction terms and then each possible combination for differences between subjects that are not the reference level but I'm not sure about that.
Thanks in advance for the help!
best,
Giovanni
This is the output of sessionInfo():
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=it_IT.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=it_IT.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggdendro_0.1-20 vegan_2.4-6 lattice_0.20-35 permute_0.9-4
[5] metagenomeSeq_1.20.1 RColorBrewer_1.1-2 glmnet_2.0-13 foreach_1.4.4
[9] Matrix_1.2-11 limma_3.34.9 gridExtra_2.3 ggplot2_2.2.1
[13] DESeq2_1.18.1 SummarizedExperiment_1.8.1 DelayedArray_0.4.1 matrixStats_0.53.1
[17] Biobase_2.38.0 GenomicRanges_1.30.2 GenomeInfoDb_1.14.0 IRanges_2.12.0
[21] S4Vectors_0.16.0 BiocGenerics_0.24.0 phyloseq_1.23.1 reshape2_1.4.3
loaded via a namespace (and not attached):
[1] nlme_3.1-131 bitops_1.0-6 bit64_0.9-7 tools_3.4.3 backports_1.1.2
[6] rpart_4.1-13 KernSmooth_2.23-15 Hmisc_4.1-1 DBI_0.7 lazyeval_0.2.1
[11] mgcv_1.8-23 colorspace_1.3-2 ade4_1.7-10 nnet_7.3-12 bit_1.1-12
[16] compiler_3.4.3 htmlTable_1.11.2 labeling_0.3 caTools_1.17.1 scales_0.5.0.9000
[21] checkmate_1.8.5 genefilter_1.60.0 stringr_1.3.0 digest_0.6.15 foreign_0.8-69
[26] XVector_0.18.0 base64enc_0.1-3 pkgconfig_2.0.1 htmltools_0.3.6 htmlwidgets_1.0
[31] rlang_0.2.0 rstudioapi_0.7 RSQLite_2.0 jsonlite_1.5 BiocParallel_1.12.0
[36] gtools_3.5.0 acepack_1.4.1 RCurl_1.95-4.10 magrittr_1.5 GenomeInfoDbData_1.0.0
[41] Formula_1.2-2 biomformat_1.6.0 Rcpp_0.12.15 munsell_0.4.3 ape_5.0
[46] stringi_1.1.6 yaml_2.1.16 MASS_7.3-49 zlibbioc_1.24.0 rhdf5_2.22.0
[51] gplots_3.0.1 plyr_1.8.4 blob_1.1.0 gdata_2.18.0 Biostrings_2.46.0
[56] splines_3.4.3 multtest_2.34.0 annotate_1.56.1 locfit_1.5-9.1 knitr_1.20
[61] pillar_1.1.0 igraph_1.1.2 geneplotter_1.56.0 codetools_0.2-15 XML_3.98-1.10
[66] latticeExtra_0.6-28 data.table_1.10.4-3 gtable_0.2.0 xtable_1.8-2 survival_2.41-3
[71] tibble_1.4.2 iterators_1.0.9 AnnotationDbi_1.40.0 memoise_1.1.0 cluster_2.0.6
Thanks for the reply!
Sorry to bother you but I'm trying to fully understand how DESeq2 builds result tables. The steps I reported for subject2 can be considered correct? I was planning to use DESeq2 for other studies where I cannot make the simplification that you suggested so I need to know if (in principle) my approach is fine. In addition, when I run the contrast between food A and B for subject1 I get the following results: https://figshare.com/s/d7135b65f2a909388725. As you may notice OTU 24, 34, 35, 42, and 44 have zero counts in both A and B but they were reported in the result table with an adjusted p-value lower than 0.1 (I'm using the default value for simplicity). The result table is the following:
How can this be possible?
Thanks again!
best,
Giovanni
Yes, your contrasts seem correct. I've seen this occur when the counts do not follow a negative binomial (typically the data is not RNA-seq), e.g. the data is bimodal, and there are many groups involved. I recommend to use LFC shrinkage with the lfcShrink function and apply a minimal filter, e.g. subset(res, abs(log2FoldChange) > 0.5). In the next version of DESeq2 you will be able to directly apply a LFC threshold within lfcShrink.
Oh and I should say, you will need to use type="apeglm" or type="ashr" to apply LFC shrinkage to a design with interactions.
I tried with lfcShrink (both with apeglm and ashr) and it doesn't seem to work. As you said I actually have 16S rRNA data they are not RNA-seq and they could not follow negative binomial distribution (I tried to model them with glmmADMB and in some cases the Poisson family seems to work better than negative binomial according to the dAIC). In addition, I do have many groups so this can also be the cause.
Thanks again for the help,
best,
Giovanni
Out of curiosity “doesn’t seem to work” meaning that it gives an error or that gives a high LFC.
Let me clarify that:
the function lfcShrink worked fine with both methods but the output still contains zero-count OTUs. The worst thing is that they have a huge log2FoldChange and a very small p-value so it is really difficult to remove them in a "programatic" way without removing valid differences ... I think I have to do it manually or write down something to account for zero-count OTUs in the contrast selected.
I'm surprised that they'd have a large LFC in the output from lfcShrink() after running apeglm, it usually is aggressive about shrinkage. You may want to move to a different method then, if the distribution is not appropriate for this dataset.
That was my fault!
I was giving the unfiltered table to my plotting function, sorry. The function now works well and the zero-counts OTUs have been removed. My last question regarding the interaction part of the original post:
If I understood correctly I can extract contrasts between food B and C for, say, subject 2 against the reference level by using:
and this should be valid for each subject. But what about the effect of B vs C for subject 2 and another subject that is not the reference level? I know that this can be tricky but I think that I am missing something ... this is what I tried:
Is it correct? Is there a way to express the same contrast using a list of contrasts?
Thanks again for the help!
That looks correct except you need to divide the 1's and -1's by 2 (so 1/2 and -1/2) or else you aren't averaging the effect, but doubling it.Hi Mark and thank again for your help!
If I understood correctly the interaction between, let's say, food B and subject 5002 should be:
if that holds, the interaction between food B and subject 5003 should be:
so if I subtract the two I get:
that is the difference between treatment B and A for subject 5002 and 5003 (the interaction term).
If I do the same thing for food C I obtain:
and by subtracting the two:
that is the interaction term between food C and B for subjects 5003 and 5002. If I use 1/2 and -1/2 instead of 1 and -1 the result should not change but I don't understand why I need to do that ...
Many thanks again for the help!
Best,
Giovanni
I'm sorry, I wasn't reading carefully and misinterpreted your question that you wanted something like the average difference. Yes, if you want to contrast two non-reference subjects you would use the code you first posted.