Hi
I am currently working on some metagenome data and I'd like to use DESeq2 tool to find the genes that are differentially abundant in different treatment over time. Due to the type of the study, I need to normalize the count data (genes abundance) per 16S rRNA copy number which changes the distribution and nature of data from count to decimal. I know that DESeq2 works on the count data but I was just wondering if there is any possible way to analyze my data using DESeq2?
I'd appreciate the comments.
Mehdi
Thanks Michael for the reply. I have the number of 16S copy per sample but I do not know how to create a matrix out of it. The data are like this:
Sample1 Sample2 Sample3 ......
16S copy 2000 1500 1000 ......
Do you have a suggestion on how to create a matrix from this?
I'd appreciate your comments.
Mehdi
hi Mehdi,
It's not really a specialty of mine, so I don't have any ideas what the right analyses here are.
Hi Michael
Following your suggestion here is the code that I used.
library("DESeq2")
dds.data = phyloseq_to_deseq2(Candela_FEC, ~ DOS + Treatment + Treatment*DOS)
dds.data$Treatment <- relevel(dds.data$Treatment, ref="Control")
#copy number normalization
copy_number_16S = as.matrix(read.table("copy_number_16s.txt", row.names = 1, header = T, quote = "", sep = "\t", check.names = F))
#row-wise geometric mean of
normMatrix
normFactors <- copy_number_16S / exp(rowMeans(log(copy_number_16S)))
normalizationFactors(dds.data) <- normFactors
dds <- estimateSizeFactors(dds.data, normMatrix=normFactors)
#running DESeq2
dds <- DESeq(dds.data, test="LRT", reduced = ~ DOS + Treatment)
resultsNames(dds)
I'd appreciate if you could confirm that the codes are ok.
regards
Mehdi
This line of code is not necessary (the next line writes over the normalizationFactors anyway)
Otherwise, I don't have any comments. The DESeq2 code looks correct but I don't do 16S analysis myself so I can't comment very in depth on this normalization approach.
Thanks for the fast reply. I was wondering if you could also confirm my deseq2 design. I am working on metagenome data which consists of 48 records. My model has two factor: Time and Treatment, having 3-time point: 0, 8, and 21; Treatment: A (control) and B, so 8 replicates per treatment/time point.
I used the LRT test of DESeq2 since I am interested in the ratio of ratios results:
- (B_21/B_0) / (A_21 /A_0)
- (B_8/B_0) / (A_8 /A_0)
In my previouos post, I wrote the code that I used:
library("DESeq2")
dds.data = phyloseq_to_deseq2(Candela_FEC, ~ DOS + Treatment + Treatment*DOS)
dds.data$Treatment <- relevel(dds.data$Treatment, ref="Control")
#copy number normalization
copy_number_16S = as.matrix(read.table("copy_number_16s.txt", row.names = 1, header = T, quote = "", sep = "\t", check.names = F))
#row-wise geometric mean of
normMatrix
normFactors <- copy_number_16S / exp(rowMeans(log(copy_number_16S)))
dds <- estimateSizeFactors(dds.data, normMatrix=normFactors)
#running DESeq2
dds <- DESeq(dds.data, test="LRT", reduced = ~ DOS + Treatment)
> resultsNames(dds)
[1] "Intercept" "DOS_21_vs_0" "DOS_8_vs_0"
[7] "Treatment_B_vs_A" "DOS21.TreatmentB" "DOS8.TreatmentB"
> res_21_vs_0 = results(dds, name = "TDOS21.TreatmentB") # The difference between treatments over time (0 to 21), after accounting for the difference at time 0
> res_8_vs_0 = results(dds, name = "DOS8.TreatmentB") # # The difference between treatments over time (0 to 8), after accounting for the difference at time 0
> res <- results(dds, contrast=list(c("DOS21.TreatmentB","DOS8.TreatmentA"))) # The difference between treatments over time (8 to 21), after accounting for the difference at time 0
My questions:
1. So the genes with the small p value and the positive log2fold change are the genes that enriched over time due to the effect of Treatment B, right?
2. and the genes with the small p value and the negative log2fold change are the genes that decreased over time due to the effect of Tretment B, right?
Thanks in advance for your help.
Regards
Mehdi
Please review the section of ?results about using the LRT. See also the LRT section of the vignette for more details.
In particular, changing "name" does not recompute p-values for an LRT, unless you add test="Wald".
When you specify "contrast", results() will re-compute p-values, and you can add test="Wald" to make it clear from the code as well that Wald tests are being computed.
The last line won't work, because there is no such interaction term DOS8.TreatmentA. Was this a typo?
If you want to compare the B vs A difference across two time points you would use that last line, except the "A" should be a "B" in the second term.
Thanks Michael
The last line won't work, because there is no such interaction term DOS8.TreatmentA. Was this a typo?
Yes, it was.
In particular, changing "name" does not recompute p-values for an LRT, unless you add test="Wald".
So, you mean if I add test=wald to the code; res_21_vs_0 = results(dds, name = "TDOS21.TreatmentB", test=wald), does the result show the genes that are different between treatments over time (0 to 21), after accounting for the difference at time 0? I thought wald test is just for comparing time points and does not account for the shift at time=0.
That will give you genes different between treatments at time 21 controlling for baseline at time 0. Wald and LRT can both be used to test interaction terms, our interface is just a little bit easier for Wald testing.
For LRT, you would need to perform a new DESeq() run for each "reduced" design.
If the design and interaction terms are still confusing, I'd recommend finding a statistician you can run these by. There is nothing particular to DESeq2, these are the same terms you would get with a linear model of a single "y" variable.
Thanks Michael for the reply. Just a short question. As you said the following line gives me the difference between treatments at day 21
In another post (https://support.bioconductor.org/p/101002/), you suggested that a similar design like the below one would show the difference between treatments at a given time point.
Would you please explain the difference?
Thanks
Mehdi
The first is the interaction and the second is the main effect plus interaction. See the diagram of interactions in the vignette or consult a local statistician for more interpretation.