Hi,
My question regards working with 450k data - correcting for batch effects and testing for significant differences in methylation between two groups. I'm rerunning code that my colleague originally wrote and ran in 2012. Originally, 379 sites were identified as significantly different between groups (0.05 level of significance). However, when I run the code now, I identify 5,711 sites. The sites originally identified are included, and overall they are the more significant sites - most are included in the top 10% of significant sites. I am using the current version of R and bioconductor, but the code was originally run on an older version (likely 2.10). Might the difference in significant sites be due to changes in the SVA package, or packages that it relies on?
My code is: (mscoreset is an expression set)
pheno=pData(mscoreset) edata=exprs(mscoreset) mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) batch=pheno$Batch combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) pValuesComBat=f.pvalue(combat_edata,mod,mod0) qValuesComBat=p.adjust(pValuesComBat,method="BH")
Thanks for your help!
Anne
Hi Evan,
Yes, I can share my data with you. Can I contact you by email once I figure out the best method to send you the data?
Thanks!
Hi Jeff,
Thanks for your response. To clarify, do you mean running ComBat with the null model (mod0), and then testing for differences between groups (the Group variable) by running f.pvalue and p.adjust as written? However, the documentation for SVA states that the model passed used for ComBat should include all variables of interest. If I do try running ComBat with the null model I only get 9 significant sites.