Hello,
I aim to correlate taxa-specific relative/total abundances (derived from 16S Illumina MiSeq Sequencing) of soil microbes with metadata along an environmental soil gradient.
In order to normalize samples for variable sequencing depths (18k - 45k reads) I have so far rarefied community data obtained by iterative subsampling (ISS) to the lowest sequencing depth (i.e. 18k reads). These ISS-normalized abundances I had used as input for subsequent correlation analysis. However, searching literature I realized that this type of normalization may be unsuitable for differential abundance analysis. McMurdie and Holmes (2014) propose (alongside other approaches) to use variance stabilizing transformation (VST) from the package DESeq2 (function varianceStabilizingTransformation()) as a precursor for subsequent differential abundance analysis.
Now I was wondering if it is mathematically sound if I use my raw abundance data (+1 to get rid of zeros), do VST, backtransform the resulting abundances using 2^x and express taxa abundances as a relative abundance (i.e. VST&backtransformed abundances / sum(VST&backtransformed abundances) ) for each community (in my case, for each soil sample)? For several reasons in my downstream analysis (such as getting to estimates of total abundance by multiplication of relative abundance with 16S qPCR data), this would be relevant for my goals.
However, I do not understand VST well enough to figure out whether such workflow would be ok? ```
Thank you for your fast and helpful clarifications.
I must have understood the pseudo-counts wrong (the video episode CC195 of the Riffomonas project by PD Schloss on Youtube somehow left me with that idea).
Would it perhaps seem mathematically admissible to do the estimation of total abundance (by multiplication of relative abundance with qPCR data) before doing the transformation?
I will also look at the elaborate paramedic package by the Willis group, I was not aware of it, thanks for this suggestion.