Question

Alpha diversity on Rlogtransformed data

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.1 years ago

Hello,

I´m running a miseq experiment on 100 samples metagenomics experiment and would like to measure the alpha diversity. My understanding is that data has to be first transformed (rarefaction is not suitable anymore). So I have transformed my phyloseq object to a DeSeq2 object (see code below).

From what i´ve found the rlogtransformation from Deseq2 looks ok when you do not have biological replicates and it performs better when library sizes can be highly different (which us my case here). However the rlogtransform generates negative values which will fail with some of the phyloseq alpha diversities (see method below)

I can obviously replace my negative values with , let´s say 0.001 or 0.01.... but not sure which value to take ???

What do you think it´s the best approach. MetagenomeSeq provides a css method for normalization of the data, dealing with negative values but is it necessary to normalize for alpha diversity or simply log transform the data ?? Does metagenomeSeq provides a similar rlogtransformation to deseq2 ?? well just want to make sure i use the appropriate method.

GPdds = phyloseq_to_deseq2(taxoRLD, ~1)
GPdds = estimateSizeFactors(GPdds)
GPdds = estimateDispersions(GPdds, fitType = "local")
rld <- rlogTransformation(GPdds, blind=TRUE)
rownames(rld) <- taxa_names(taxoRLD)
otu_table(taxoRLD) <- otu_table(assay(rld), taxa_are_rows = TRUE)

#Alpha diversity
OTU = otu_table(round(as(otu_table(taxoRLD), "matrix")), FALSE)
alpha = estimate_richness(OTU, split = FALSE, measures = c("Observed", "Chao1", "ACE", "Shannon", "Simpson", "InvSimpson"))

> alpha
               Observed Chao1 se.chao1 ACE se.ACE  Shannon
estimateR.OTU.        0     0      NaN NaN    NaN 4.604333
                 Simpson InvSimpson
estimateR.OTU. 0.9899836   99.83641

Thanks for your comments.

deseq2 phyloseq metagenomeseq • 3.8k views

ADD COMMENT • link 9.4 years ago David ▴ 860

0

Entering edit mode

I would say that any method that can't handle negative values probably isn't expecting its input to be on a log scale.

ADD REPLY • link 9.4 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2015-12-10

I can answer some of the DESeq2 questions, but not the best practices for metagenomics, as this is not my area of expertise.

"From what i´ve found the rlogtransformation from Deseq2 looks ok when you do not have biological replicates and it performs better when library sizes can be highly different (which us my case here)."

Yes this was the motivating factor for the development of rlog over VST (which we discuss in the DESeq2 paper). The VST has its own advantages, such as running time when there are many samples and that it is theoretically motivated for stabilizing variance.

"However the rlogtransform generates negative values"

This is not really a bug from the DESeq2 point of view.

rlog() is approximating a log2(), and when the shrunken value is less than 1 read, it makes sense to give a negative value. I don't have any feedback on what to do with these with respect to downstream metagenomic analysis.

score 0 · Answer 2 · 2015-12-14

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.1 years ago

Thanks for your comments, havn´t found standard practices yet for metagenomics.

It looks that rlog transformation might not be required for any of the alpha diversities used, so it would be straight to directly use untransfomed data. I guess that´s what i´m going to do.

For other downstream analysis such as diffrential expression i will definitely rlog transform the data. (or normalize if i can have biological replicates).

Feel free to update the post if other practices are commonly used.

thanks,

david

ADD COMMENT • link 9.4 years ago David ▴ 860

0

Entering edit mode

Note that we do not recommend using rlog for DE. The first line of the vignette section on transformations:

"In order to test for differential expression, we operate on raw counts and use discrete distributions as described
in the previous Section 1.4. However for other downstream analyses – e.g. for visualization or clustering – it
might be useful to work with transformed versions of the count data."

ADD REPLY • link 9.4 years ago Michael Love 43k

score 0 · Answer 3 · 2015-12-14

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 7.1 years ago

Ok thanks for the update. Not to use Rlog for DE.

ADD COMMENT • link 9.4 years ago David ▴ 860

0

Entering edit mode

Same for metagenomeSeq - we perform DE on the counts. On Mon, Dec 14, 2015 at 9:19 AM David [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User David <https: support.bioconductor.org="" u="" 3335=""/> wrote Answer: Alpha > diversity on Rlogtransformed data > <https: support.bioconductor.org="" p="" 75823="" #75957="">: > > Ok thanks for the update. Not to use Rlog for DE. > > > ------------------------------ > > Post tags: deseq2, phyloseq, metagenomeseq > > You may reply via email or visit > A: Alpha diversity on Rlogtransformed data >

ADD REPLY • link 9.4 years ago Joseph Nathaniel Paulson ▴ 280