I am working on a project where I am comparing how different statistical methods for finding differential abundance perform. Over the course of this project I have found that the zero-inflated, feature-specific lognormal model (Feature model, fitFeatureModel
function) generally performs better than the zero-inflated Gaussian model (ZIG model, fitZig
function), but the Feature model is unable to handle confounding covariates unlike the ZIG model. Both the manual and vignette clearly indicate that confounders can be incorporated into the model design matrix for the ZIG model but there is no similar mention for the Feature model (see example code from the vignette below).
## Feature model
data(lungData)
lungData = lungData[, -whichis.na(pData(lungData)$SmokingStatus))]
lungData = filterData(lungData, present = 30, depth = 1)
lungData <- cumNorm(lungData, p = 0.5)
pd <- pData(lungData)
## No covariates in the model matrix
mod <- model.matrix(~1 + SmokingStatus, data = pd)
lungres1 = fitFeatureModel(lungData, mod)
## ZIG model
data(lungData)
controls = grep("Extraction.Control", pData(lungData)$SampleType)
lungTrim = lungData[, -controls]
rareFeatures = which(rowSums(MRcounts(lungTrim) > 0) < 10)
lungTrim = lungTrim[-rareFeatures, ]
lungp = cumNormStat(lungTrim, pFlag = TRUE, main = "Trimmed lung data")
lungTrim = cumNorm(lungTrim, p = lungp)
smokingStatus = pData(lungTrim)$SmokingStatus
bodySite = pData(lungTrim)$SampleType
normFactor = normFactors(lungTrim)
normFactor = log2(normFactor/median(normFactor) + 1)
## bodySite and normFactor are the covariates
mod = model.matrix(~smokingStatus + bodySite + normFactor)
settings = zigControl(maxit = 10, verbose = TRUE)
fit = fitZig(obj = lungTrim, mod = mod, useCSSoffset = FALSE, control = settings)
Whenever I try to use covariates with the Feature model, either with the example data or my own data, I keep getting an error saying Can't analyze currently
. I have also tried to fit the zero-inflated lognormal model directly myself using code from the package to no avail. Is there a mathematical / statistical explanation as to why the Feature model cannot handle covariates that I am just missing?
Let me know if you need any additional information from me. Thanks in advance!