Question

DESeq2 on 16S copy number corrected genes

0

Entering edit mode

ghanbari.msc ▴ 10

@ghanbarimsc-13659

Last seen 7.0 years ago

Hi

I am currently working on some metagenome data and I'd like to use DESeq2 tool to find the genes that are differentially abundant in different treatment over time. Due to the type of the study, I need to normalize the count data (genes abundance) per 16S rRNA copy number which changes the distribution and nature of data from count to decimal. I know that DESeq2 works on the count data but I was just wondering if there is any possible way to analyze my data using DESeq2?

I'd appreciate the comments.

Mehdi

deseq2 metagenomics • 2.6k views

ADD COMMENT • link updated 7.6 years ago by Michael Love 43k • written 7.6 years ago by ghanbari.msc ▴ 10

score 2 · Accepted Answer · 2017-09-01

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

hi Mehdi,

The way to normalize in DESeq2 if you have a matrix is to not divide the counts, but to provide this matrix to estimateSizeFactor's normMatrix argument (see the help page for estimateSizeFactors). You should divide out the geometric mean of each row of your matrix before providing it to estimateSizeFactors.

ADD COMMENT • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael for the reply. I have the number of 16S copy per sample but I do not know how to create a matrix out of it. The data are like this:

Sample1 Sample2 Sample3 ......

16S copy 2000 1500 1000 ......

Do you have a suggestion on how to create a matrix from this?

I'd appreciate your comments.

Mehdi

ADD REPLY • link 7.6 years ago ghanbari.msc ▴ 10

0

Entering edit mode

hi Mehdi,

It's not really a specialty of mine, so I don't have any ideas what the right analyses here are.

ADD REPLY • link 7.6 years ago Michael Love 43k

0

Entering edit mode

Hi Michael

Following your suggestion here is the code that I used.

library("DESeq2")
dds.data = phyloseq_to_deseq2(Candela_FEC, ~ DOS + Treatment + Treatment*DOS)
dds.data$Treatment <- relevel(dds.data$Treatment, ref="Control")

#copy number normalization
copy_number_16S = as.matrix(read.table("copy_number_16s.txt", row.names = 1, header = T, quote = "", sep = "\t", check.names = F))

#row-wise geometric mean of normMatrix

normFactors <- copy_number_16S / exp(rowMeans(log(copy_number_16S)))

normalizationFactors(dds.data) <- normFactors

dds <- estimateSizeFactors(dds.data, normMatrix=normFactors)

#running DESeq2

dds <- DESeq(dds.data, test="LRT", reduced = ~ DOS + Treatment)

resultsNames(dds)

I'd appreciate if you could confirm that the codes are ok.

regards

Mehdi

ADD REPLY • link 7.2 years ago ghanbari.msc ▴ 10

1

Entering edit mode

This line of code is not necessary (the next line writes over the normalizationFactors anyway)

normalizationFactors(dds.data) <- normFactors

Otherwise, I don't have any comments. The DESeq2 code looks correct but I don't do 16S analysis myself so I can't comment very in depth on this normalization approach.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Thanks for the fast reply. I was wondering if you could also confirm my deseq2 design. I am working on metagenome data which consists of 48 records. My model has two factor: Time and Treatment, having 3-time point: 0, 8, and 21; Treatment: A (control) and B, so 8 replicates per treatment/time point.

I used the LRT test of DESeq2 since I am interested in the ratio of ratios results:

- (B_21/B_0) / (A_21 /A_0)

- (B_8/B_0) / (A_8 /A_0)

In my previouos post, I wrote the code that I used:

library("DESeq2")

dds.data = phyloseq_to_deseq2(Candela_FEC, ~ DOS + Treatment + Treatment*DOS)
dds.data$Treatment <- relevel(dds.data$Treatment, ref="Control")

#copy number normalization
copy_number_16S = as.matrix(read.table("copy_number_16s.txt", row.names = 1, header = T, quote = "", sep = "\t", check.names = F))

#row-wise geometric mean of normMatrix

normFactors <- copy_number_16S / exp(rowMeans(log(copy_number_16S)))

dds <- estimateSizeFactors(dds.data, normMatrix=normFactors)

#running DESeq2

dds <- DESeq(dds.data, test="LRT", reduced = ~ DOS + Treatment)

> resultsNames(dds)
[1] "Intercept" "DOS_21_vs_0" "DOS_8_vs_0"
[7] "Treatment_B_vs_A" "DOS21.TreatmentB" "DOS8.TreatmentB"

> res_21_vs_0 = results(dds, name = "TDOS21.TreatmentB") # The difference between treatments over time (0 to 21), after accounting for the difference at time 0

> res_8_vs_0 = results(dds, name = "DOS8.TreatmentB") # # The difference between treatments over time (0 to 8), after accounting for the difference at time 0

> res <- results(dds, contrast=list(c("DOS21.TreatmentB","DOS8.TreatmentA"))) # The difference between treatments over time (8 to 21), after accounting for the difference at time 0

My questions:

1. So the genes with the small p value and the positive log2fold change are the genes that enriched over time due to the effect of Treatment B, right?

2. and the genes with the small p value and the negative log2fold change are the genes that decreased over time due to the effect of Tretment B, right?

Thanks in advance for your help.

Regards

Mehdi

ADD REPLY • link 7.2 years ago ghanbari.msc ▴ 10

0

Entering edit mode

Please review the section of ?results about using the LRT. See also the LRT section of the vignette for more details.

In particular, changing "name" does not recompute p-values for an LRT, unless you add test="Wald".

When you specify "contrast", results() will re-compute p-values, and you can add test="Wald" to make it clear from the code as well that Wald tests are being computed.

The last line won't work, because there is no such interaction term DOS8.TreatmentA. Was this a typo?

If you want to compare the B vs A difference across two time points you would use that last line, except the "A" should be a "B" in the second term.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael

The last line won't work, because there is no such interaction term DOS8.TreatmentA. Was this a typo?

Yes, it was.

In particular, changing "name" does not recompute p-values for an LRT, unless you add test="Wald".

So, you mean if I add test=wald to the code; res_21_vs_0 = results(dds, name = "TDOS21.TreatmentB", test=wald), does the result show the genes that are different between treatments over time (0 to 21), after accounting for the difference at time 0? I thought wald test is just for comparing time points and does not account for the shift at time=0.

ADD REPLY • link 7.2 years ago ghanbari.msc ▴ 10

1

Entering edit mode

That will give you genes different between treatments at time 21 controlling for baseline at time 0. Wald and LRT can both be used to test interaction terms, our interface is just a little bit easier for Wald testing.

For LRT, you would need to perform a new DESeq() run for each "reduced" design.

If the design and interaction terms are still confusing, I'd recommend finding a statistician you can run these by. There is nothing particular to DESeq2, these are the same terms you would get with a linear model of a single "y" variable.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael for the reply. Just a short question. As you said the following line gives me the difference between treatments at day 21

results(dds, contrast=list(c("DOS21.TreatmentB")), test="Wald")

In another post (https://support.bioconductor.org/p/101002/), you suggested that a similar design like the below one would show the difference between treatments at a given time point.

results(dds, contrast=list(c("Treatment_B_vs_A","DOS21.TreatmentB")), test="Wald")

Would you please explain the difference?

Thanks

Mehdi

ADD REPLY • link 7.2 years ago ghanbari.msc ▴ 10

0

Entering edit mode

The first is the interaction and the second is the main effect plus interaction. See the diagram of interactions in the vignette or consult a local statistician for more interpretation.

ADD REPLY • link 7.2 years ago Michael Love 43k