Difference between fitFeatureModel and fitZIG in metagenomeSeq
3
0
Entering edit mode
sasha • 0
@sasha-11847
Last seen 7.4 years ago

Hi,

I would like to know the differences between using fitFeatureModel and fitZIG when testing for DA using metagenomeSeq. I don't seem to find a good explanation of what fitFeatureModel does in detail and what are the differences with fitZIG, however, given that the authors (and others in the literature) recommend to use fitFeatureModel over fitZIG for microbiome analysis (due to high sensitivity and low FDR), I would like to try/test it with my data. Could PLEASE someone give me a link to find this or explain it to me?

MANY thanks in advance and cheers,

Sasha

metagenomeseq • 4.6k views
ADD COMMENT
1
Entering edit mode
handibles ▴ 20
@handibles-12652
Last seen 7.4 years ago

I'd like to second Sasha's request for clarifications on the ZIG and Feature Model methods, and the recommended use of these tests. From the metagenomeseq.pdf page 16:

" This is our latest development and we recommend fitFeatureModel over fitZig [...] By reparametrizing our zero-inflation model, we’re able to fit a zero-inflated model for each specific OTU separately. We currently recommend using the zero-inflated log-normal model as implemented in fitFeatureModel "

Account of how these functions work are found in the associated NMethods paper

" To explicitly account for undersampling, we use a mixture model that implements a ZIG distribution of mean group abundance for each taxonomic feature [...] Using posterior probability estimates that account for community undersampling as weights to estimate count distribution parameters reduced the estimated fold change between the two groups under study. Furthermore, counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08). "

from these readings:

fitZIG uses a Zero-Inflated, Gaussian (normal) distribution, mixture-model with posterior-probability weighting of OTU abundances to model the distribution of OTU counts.

fitFeatureModel instead uses a Zero-Inflated, Log-Normal distribution, mixture-model with the same P.P weighting as above, because as per the NMethods paper: "counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08).". So fitFeatureModel provides better approximations to a normal distribution for our OTUs, making our Moderated T Tests more reliable (see next, please correct as required).

From what I can tell, both methods then use the moderated T test à la the limma package (see page 60) to test whether actual counts diverge from their respective models significantly (are differentially abundant) with respect to experimental factors (& an added error value ε as per mixture models) defined in model selection. 

So, both methods do the 'same' thing, but are based in different distributions (gaussian / log-normal) for standardising counts between samples.

However,

  • neither method's output (via MRcoefs/fulltable etc) is explained/worked through. While fitFeatureModel does report log2 fold change, fitZig is ambiguous (column title is simply the coefficient of interest).
  • It is not clear whether or not the Moderated T Test is measuring analogous values for either function (e.g. are reported coefficient values log2FC in both, thus both are T-Tested and loosely comparable?) 
  • Nor is it stated anywhere (in the manual or NMethods) what the differential test being carried out is, although there are references to limma (which uses moderated T Test).
  • fitZig is capable of [testing multiple groups (section 4.2)]*, fitFeatureModel is not. I expect many could be tempted to use fitZig for their multiple testing and disregard fitFeatureModel's improvements.
  • metagenomeSeq 1.16.0: unique features give NA's for logFC / p-val?. From repeating the test with fitZig, it does not have this issue but logFC (?) values differ, so results may not be comparable. Again, a temptation to use fitZig over fitFeatureModel to overcome this. 

From this: fitFeatureModel is closest to parametric methods, and is preferable to fitZig if it works for your situation. fitZig is a looser parametric method with more obtuse output but better support in this package. Any corrections are gladly accepted, and the efforts of the authors are much appreciated.  

PS: * = edited from 'doing multiple testing' for clarity

ADD COMMENT
0
Entering edit mode

The fitZig output is analogous to Limma's topTable output due to historical reasons and how closely fitZig was built on top of the lmFit function. The column title is intended to be the coefficient of interest (as logFC is really just the estimate the coefficient of interest in our case). I agree though that there should be similarity in output between fitZig and fitFeatureModel so I will work on that.

I don't understand most of your other questions/comments. For example, multiple testing correction is always possible and recommended. The user can take any set of p-values and perform their favorite correction using p.adjust or other functions. 

ADD REPLY
0
Entering edit mode

Apologies, I'll try to correct & clarify:

  • By 'multiple testing' I meant fitFeatureModel doesn't seem to allow for comparisons between groups (e.g. A v. B v. C v. D. etc), while fitZig does (section 4.2) - if attempted it simply gives the exception 'Can't analyse currently' (apologies if I've got this incorrect). As the 'recommended' method, this could be covered in the manual, via a recommendation on collating and correcting multiple pairwise tests, or simply a note that 'this feature is not yet available'.
  • metagenomeSeq provides a number of worthy statistical tests. Users might benefit from having these tests, and the differences in their applications, overviewed/compared together at the beginning of the document/section. If fitFeatureModel is the recommended method, it should be emphasised and worked through in a similar manner to fitZig. 
  • My other points relate to clarity in the documentation and in the ouput, which make it hard to tell what the functions are returning, and how these values were generated.  
    • It's unclear what differential test provides the p-values in either fitZig/FeatureMethod. Again, references to limma are made; however R functions / user-manual should document their package such that the output is clear. 
    • As above, work-through of the output would clarify much of what is being compared etc., both within and between statistical tests.

[edit:] Additionally, the contribution of the log2 transformation to CSS is not conveyed in the metagenomeSeq manual. It's importance in discriminating populations is not addressed, and although it defaults=TRUE and is implemented in the workthrough provided, there is no guidance or explanation on its use. Amending this would leave users more informed on their analysis. 

ADD REPLY
0
Entering edit mode
  • Yes, thank you for the suggestion. Will add a note that this feature is not available yet.
  • For the next release I will definitely modify the vignette. It's become a bit of a hodgepodge of added tests over the last year or so. A bit of restructuring will be good.
  • As before, thanks will work on it!
ADD REPLY
1
Entering edit mode
@joseph-nathaniel-paulson-6442
Last seen 7.7 years ago
United States

There is a bit of a confusion that I will try to address. fitZig and fitFeatureModel provide two different methods.

fitZig is the approach described in http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2658.html.
fitFeatureModel is described in Chapter 4 of Normalization and differential abundance analysis of metagenomic biomarker-gene surveys (available here: http://drum.lib.umd.edu/handle/1903/16996 ). A paper summarizing the results of Chapter 4 is in the works and potentially another paper summarizing all of the various new features to metagenomeSeq (including fitFeatureModel, etc.).

I will place in the devel vignette specific references to the description of the models for clarity. Thanks for bringing this up.

ADD COMMENT
0
Entering edit mode
sasha • 0
@sasha-11847
Last seen 7.4 years ago

Thanks to both handibles and Joseph

ADD COMMENT

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6