Dramatically different results from comparing multiple groups using fitZig versus fitFeatureModel
0
0
Entering edit mode
craemuletz • 0
@craemuletz-9793
Last seen 8.9 years ago

Hi,

I am doing differential abundance OTU comparisons among three parks that I sampled.

I tried using the fitZig model for multiple group comparisons and got some results that seemed completely incorrect from data visualization. I then chose to use fitFeatureModel and do pairwise comparisons. You also suggest in the .pdf vignette that fitFeatureModel is recommended over fitZig. However, multiple comparisons in fitFeatureModel are not supported yet, correct?

Here is my code and some results to illustrate what I am finding. 

My main question is why is fitZig given me such low p-values and many, many more significant OTUs using multiple comparisons when fitFeatureModel is not. 

MGS = newMRexperiment(counts = OTU, phenoData = ADF, featureData = TDF)

p = cumNormStatFast(MGS)
MGS= cumNorm(MGS, p =p)

> MGS
MRexperiment (storageMode: environment)
assayData: 48 features, 62 samples 
  element names: counts, z, zUsed 
protocolData: none
phenoData
  sampleNames: RSB2 LSB7 ... NRA.40 (62 total)
  varLabels: X.SampleID Run_no ... Humidity (34 total)
  varMetadata: labelDescription
featureData
  featureNames: OTU_21 OTU_38 ... OTU_40 (48 total)
  fvarLabels: OTUname
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  

 

## Different groups
pd_ss <- pData(MGS)
settings = zigControl(maxit = 10, verbose = FALSE) 
mod_Parks <- model.matrix(~Park, data = pd_ss)

colnames(mod_Parks) = levels(pd_ss$Park)
# fitting the ZIG model
res_park = fitZig(obj = MGS, mod = mod_Parks, control = settings)


zigFit_park = res_park$fit
finalMod_park <- res_park$fit$design

contrast.matrix_park = makeContrasts(Catoctin - Mt.Rogers, Shenandoah - Catoctin, Mt.Rogers - Shenandoah, levels = finalMod_park)


fit2_park = contrasts.fit(zigFit_park, contrast.matrix_park)
fit2_park = eBayes(fit2_park)

topTable(fit2_park, coef ="Shenandoah - Catoctin", adjust = "BH")

> topTable(fit2_park, coef ="Shenandoah - Catoctin", adjust = "BH")
             logFC  AveExpr          t      P.Value    adj.P.Val        B
OTU_1   -12.155793 9.710132 -22.970492 5.227636e-34 2.509266e-32 64.95115
OTU_2   -10.350074 7.649256 -21.655079 1.865275e-32 4.476660e-31 61.73235
OTU_3    -9.525484 6.625287 -15.708142 1.842527e-24 2.948043e-23 44.66983
OTU_4    -8.385456 6.718744 -15.622330 2.476676e-24 2.972011e-23 44.39024
OTU_96   -8.079691 4.300305 -13.975174 2.773282e-21 2.662350e-20 37.65581
OTU_268  -6.573375 6.240717 -12.649284 1.244909e-19 9.959274e-19 34.07040
OTU_6    -7.119143 3.360560 -11.216327 1.052764e-15 7.218954e-15 25.16691
OTU_37   -4.161625 3.781222  -8.362956 5.394523e-12 3.236714e-11 17.07053
OTU_7    -5.126144 4.514163  -8.045264 1.670393e-11 8.765242e-11 15.96477
OTU_8    -6.170905 3.774331  -8.200372 1.826092e-11 8.765242e-11 15.90136

topTable(fit2_park, coef ="Catoctin - Mt.Rogers", adjust = "BH")

> topTable(fit2_park, coef ="Catoctin - Mt.Rogers", adjust = "BH")
            logFC  AveExpr         t      P.Value    adj.P.Val         B
OTU_1   10.171412 9.710132 19.536042 8.424779e-30 4.043894e-28 56.217439
OTU_2    6.439619 7.649256 13.694456 2.435999e-21 5.846399e-20 37.865696
OTU_268  5.587160 6.240717 10.927905 1.083528e-16 1.733644e-15 27.560222
OTU_4    5.630014 6.718744 10.660975 3.188298e-16 3.825957e-15 26.515401
OTU_3    6.031354 6.625287 10.109295 3.032818e-15 2.911505e-14 24.332249
OTU_96   4.129229 4.300305  7.396989 3.383950e-10 2.707160e-09 13.057069
OTU_7    3.714935 4.514163  5.926101 1.104691e-07 7.575026e-07  7.425371
OTU_37   2.616401 3.781222  5.374069 1.049990e-06 6.299942e-06  5.263770
OTU_59   2.241100 1.391556  4.913510 8.397209e-06 4.478512e-05  3.401695
OTU_8    3.410365 3.774331  4.709491 1.450487e-05 6.962338e-05  2.801458

 

### So, from data visualization the first comparison makes sense, but the p-values are so, so low

### The second comparison seems completely inaccurate from data visualization, and from doing pairwise fitFeatureModel

### Here is code and results from fitFeatureModel

## Shenandoah to Catoctin 

Park_ShenCat = MGS[, -which(pData(MGS)$Park == "Mt.Rogers")] 
pd_shencat <- pData(Park_ShenCat)

mod_shencat <- model.matrix(~Park, data = pd_shencat)
modres5 = fitFeatureModel(Park_ShenCat, mod_shencat)

y_5 <- MRcoefs(modres5, number = 10, group = 3)
y_5

> y_5

            logFC        se      pvalues   adjPvalues
OTU_449  4.493044 0.5524996 4.440892e-16 1.687539e-14
OTU_249  2.331554 0.3086840 4.241052e-14 8.057999e-13
OTU_165 -2.575309 0.3446273 7.860379e-14 9.956480e-13
OTU_8   -2.269474 0.3238382 2.416733e-12 2.295897e-11
OTU_5    3.829128 0.5488604 3.026024e-12 2.299778e-11
OTU_79   2.424200 0.4239055 1.073252e-08 6.797260e-08
OTU_2   -2.715030 0.6008155 6.215932e-06 3.374363e-05
OTU_19  -2.536399 0.5824391 1.331956e-05 6.326791e-05
OTU_59  -1.328322 0.3375506 8.313507e-05 3.510147e-04
OTU_3   -2.452399 0.6573659 1.909904e-04 6.603555e-04

### LogFC and p-values are completely different from fitZig

### Now for Catoctin and Mt. Rogers that fitZig indicates many OTUs are differentially abundant, I run using fitFeatureModel and get nothing significant (which is what data visualization indicates)

 

# subset Catoctin, Mt. Rogers

Park_CatMt = MGS[, -which(pData(MGS)$Park == "Shenandoah")] 
pd_catmt <- pData(Park_CatMt)
 

mod_catmt <- model.matrix(~Park, data = pd_catmt)
modres3 = fitFeatureModel(Park_CatMt, mod_catmt)
y_3 <- MRcoefs(modres3, group = 3)
y_3

> y_3
             logFC        se     pvalues adjPvalues
OTU_59  -0.6640668 0.2498593 0.007866115  0.3618413
OTU_54   1.7521338 0.8037968 0.029270670  0.6732254
OTU_79   0.7737750 0.4247185 0.068477179  0.8804850
OTU_61   1.2618324 0.8568463 0.140846168  0.8804850
OTU_158 -0.6987362 0.4818186 0.147001118  0.8804850
OTU_38   0.9552524 0.6610413 0.148437631  0.8804850
OTU_249  0.4894603 0.3514591 0.163725011  0.8804850
OTU_27  -0.3280126 0.2503710 0.190159879  0.8804850
OTU_9   -0.8304423 0.6358127 0.191514631  0.8804850
OTU_23  -0.3977028 0.3269615 0.223847908  0.8804850

### Even if I use fitZig without doing multiple comparisons, I still see OTUs indicated as significant that really don't seem like they should be.

#### Why would this happen? I don't really trust the fitZig group comparisons...

Thank you,

Carly

metagenomeseq • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6