Hi,
I am doing differential abundance OTU comparisons among three parks that I sampled.
I tried using the fitZig model for multiple group comparisons and got some results that seemed completely incorrect from data visualization. I then chose to use fitFeatureModel and do pairwise comparisons. You also suggest in the .pdf vignette that fitFeatureModel is recommended over fitZig. However, multiple comparisons in fitFeatureModel are not supported yet, correct?
Here is my code and some results to illustrate what I am finding.
My main question is why is fitZig given me such low p-values and many, many more significant OTUs using multiple comparisons when fitFeatureModel is not.
MGS = newMRexperiment(counts = OTU, phenoData = ADF, featureData = TDF)
p = cumNormStatFast(MGS)
MGS= cumNorm(MGS, p =p)
> MGS
MRexperiment (storageMode: environment)
assayData: 48 features, 62 samples
element names: counts, z, zUsed
protocolData: none
phenoData
sampleNames: RSB2 LSB7 ... NRA.40 (62 total)
varLabels: X.SampleID Run_no ... Humidity (34 total)
varMetadata: labelDescription
featureData
featureNames: OTU_21 OTU_38 ... OTU_40 (48 total)
fvarLabels: OTUname
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
## Different groups
pd_ss <- pData(MGS)
settings = zigControl(maxit = 10, verbose = FALSE)
mod_Parks <- model.matrix(~Park, data = pd_ss)
colnames(mod_Parks) = levels(pd_ss$Park)
# fitting the ZIG model
res_park = fitZig(obj = MGS, mod = mod_Parks, control = settings)
zigFit_park = res_park$fit
finalMod_park <- res_park$fit$design
contrast.matrix_park = makeContrasts(Catoctin - Mt.Rogers, Shenandoah - Catoctin, Mt.Rogers - Shenandoah, levels = finalMod_park)
fit2_park = contrasts.fit(zigFit_park, contrast.matrix_park)
fit2_park = eBayes(fit2_park)
topTable(fit2_park, coef ="Shenandoah - Catoctin", adjust = "BH")
> topTable(fit2_park, coef ="Shenandoah - Catoctin", adjust = "BH")
logFC AveExpr t P.Value adj.P.Val B
OTU_1 -12.155793 9.710132 -22.970492 5.227636e-34 2.509266e-32 64.95115
OTU_2 -10.350074 7.649256 -21.655079 1.865275e-32 4.476660e-31 61.73235
OTU_3 -9.525484 6.625287 -15.708142 1.842527e-24 2.948043e-23 44.66983
OTU_4 -8.385456 6.718744 -15.622330 2.476676e-24 2.972011e-23 44.39024
OTU_96 -8.079691 4.300305 -13.975174 2.773282e-21 2.662350e-20 37.65581
OTU_268 -6.573375 6.240717 -12.649284 1.244909e-19 9.959274e-19 34.07040
OTU_6 -7.119143 3.360560 -11.216327 1.052764e-15 7.218954e-15 25.16691
OTU_37 -4.161625 3.781222 -8.362956 5.394523e-12 3.236714e-11 17.07053
OTU_7 -5.126144 4.514163 -8.045264 1.670393e-11 8.765242e-11 15.96477
OTU_8 -6.170905 3.774331 -8.200372 1.826092e-11 8.765242e-11 15.90136
topTable(fit2_park, coef ="Catoctin - Mt.Rogers", adjust = "BH")
> topTable(fit2_park, coef ="Catoctin - Mt.Rogers", adjust = "BH")
logFC AveExpr t P.Value adj.P.Val B
OTU_1 10.171412 9.710132 19.536042 8.424779e-30 4.043894e-28 56.217439
OTU_2 6.439619 7.649256 13.694456 2.435999e-21 5.846399e-20 37.865696
OTU_268 5.587160 6.240717 10.927905 1.083528e-16 1.733644e-15 27.560222
OTU_4 5.630014 6.718744 10.660975 3.188298e-16 3.825957e-15 26.515401
OTU_3 6.031354 6.625287 10.109295 3.032818e-15 2.911505e-14 24.332249
OTU_96 4.129229 4.300305 7.396989 3.383950e-10 2.707160e-09 13.057069
OTU_7 3.714935 4.514163 5.926101 1.104691e-07 7.575026e-07 7.425371
OTU_37 2.616401 3.781222 5.374069 1.049990e-06 6.299942e-06 5.263770
OTU_59 2.241100 1.391556 4.913510 8.397209e-06 4.478512e-05 3.401695
OTU_8 3.410365 3.774331 4.709491 1.450487e-05 6.962338e-05 2.801458
### So, from data visualization the first comparison makes sense, but the p-values are so, so low
### The second comparison seems completely inaccurate from data visualization, and from doing pairwise fitFeatureModel
### Here is code and results from fitFeatureModel
## Shenandoah to Catoctin
Park_ShenCat = MGS[, -which(pData(MGS)$Park == "Mt.Rogers")]
pd_shencat <- pData(Park_ShenCat)
mod_shencat <- model.matrix(~Park, data = pd_shencat)
modres5 = fitFeatureModel(Park_ShenCat, mod_shencat)
y_5 <- MRcoefs(modres5, number = 10, group = 3)
y_5
> y_5
logFC se pvalues adjPvalues
OTU_449 4.493044 0.5524996 4.440892e-16 1.687539e-14
OTU_249 2.331554 0.3086840 4.241052e-14 8.057999e-13
OTU_165 -2.575309 0.3446273 7.860379e-14 9.956480e-13
OTU_8 -2.269474 0.3238382 2.416733e-12 2.295897e-11
OTU_5 3.829128 0.5488604 3.026024e-12 2.299778e-11
OTU_79 2.424200 0.4239055 1.073252e-08 6.797260e-08
OTU_2 -2.715030 0.6008155 6.215932e-06 3.374363e-05
OTU_19 -2.536399 0.5824391 1.331956e-05 6.326791e-05
OTU_59 -1.328322 0.3375506 8.313507e-05 3.510147e-04
OTU_3 -2.452399 0.6573659 1.909904e-04 6.603555e-04
### LogFC and p-values are completely different from fitZig
### Now for Catoctin and Mt. Rogers that fitZig indicates many OTUs are differentially abundant, I run using fitFeatureModel and get nothing significant (which is what data visualization indicates)
# subset Catoctin, Mt. Rogers
Park_CatMt = MGS[, -which(pData(MGS)$Park == "Shenandoah")]
pd_catmt <- pData(Park_CatMt)
mod_catmt <- model.matrix(~Park, data = pd_catmt)
modres3 = fitFeatureModel(Park_CatMt, mod_catmt)
y_3 <- MRcoefs(modres3, group = 3)
y_3
> y_3
logFC se pvalues adjPvalues
OTU_59 -0.6640668 0.2498593 0.007866115 0.3618413
OTU_54 1.7521338 0.8037968 0.029270670 0.6732254
OTU_79 0.7737750 0.4247185 0.068477179 0.8804850
OTU_61 1.2618324 0.8568463 0.140846168 0.8804850
OTU_158 -0.6987362 0.4818186 0.147001118 0.8804850
OTU_38 0.9552524 0.6610413 0.148437631 0.8804850
OTU_249 0.4894603 0.3514591 0.163725011 0.8804850
OTU_27 -0.3280126 0.2503710 0.190159879 0.8804850
OTU_9 -0.8304423 0.6358127 0.191514631 0.8804850
OTU_23 -0.3977028 0.3269615 0.223847908 0.8804850
### Even if I use fitZig without doing multiple comparisons, I still see OTUs indicated as significant that really don't seem like they should be.
#### Why would this happen? I don't really trust the fitZig group comparisons...
Thank you,
Carly