Question

Reproducibility of results with ComBat

0

Entering edit mode

anne.bozack • 0

@annebozack-8680

Last seen 5.3 years ago

United States

Hi,

My question regards working with 450k data - correcting for batch effects and testing for significant differences in methylation between two groups. I'm rerunning code that my colleague originally wrote and ran in 2012. Originally, 379 sites were identified as significantly different between groups (0.05 level of significance). However, when I run the code now, I identify 5,711 sites. The sites originally identified are included, and overall they are the more significant sites - most are included in the top 10% of significant sites. I am using the current version of R and bioconductor, but the code was originally run on an older version (likely 2.10). Might the difference in significant sites be due to changes in the SVA package, or packages that it relies on?

My code is: (mscoreset is an expression set)

pheno=pData(mscoreset)
edata=exprs(mscoreset)
mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno)
mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno)
batch=pheno$Batch
combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE)

pValuesComBat=f.pvalue(combat_edata,mod,mod0)
qValuesComBat=p.adjust(pValuesComBat,method="BH")

Thanks for your help!

Anne

sva combat • 1.8k views

ADD COMMENT • link updated 9.7 years ago by Jeff Leek ▴ 650 • written 9.7 years ago by anne.bozack • 0

score 0 · Answer 1 · 2015-08-31

0

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 4.2 years ago

United States

Hi Anne, The code for Combat has definitely been updated since then and it is possible this is causing the change. Can you try rerunning your code without Group in the mod variable you pass to Combat? Best Jeff On Mon, Aug 31, 2015 at 3:04 PM anne.bozack [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User anne.bozack <https: support.bioconductor.org="" u="" 8680=""/> wrote Question: > Reproducibility of results with Comb > <https: support.bioconductor.org="" p="" 71668=""/>: > > Hi, > > My question regards working with 450k data - correcting for batch effects > and testing for significant differences in methylation between two groups. > I'm rerunning code that my colleague originally wrote and ran in 2012. > Originally, 379 sites were identified as significantly different between > groups (0.05 level of significance). However, when I run the code now, I > identify 5,711 sites. The sites originally identified are included, and > overall they are the more significant sites - most are included in the top > 10% of significant sites. I am using the current version of R and > bioconductor, but the code was originally run on an older version (likely > 2.10). Might the difference in significant sites be due to changes in the > SVA package, or packages that it relies on? > > My code is: (mscoreset is an expression set) > > pheno=pData(mscoreset) > edata=exprs(mscoreset) > mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) > mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) > batch=pheno$Batch > combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) > > pValuesComBat=f.pvalue(combat_edata,mod,mod0) > qValuesComBat=p.adjust(pValuesComBat,method="BH") > > Thanks for your help! > > Anne > > > > > ------------------------------ > > Post tags: sva, combat > > You may reply via email or visit Reproducibility of results with ComBat >

ADD COMMENT • link 9.7 years ago Jeff Leek ▴ 650

0

Entering edit mode

Hey Anne, This is a bit surprising. I would be very interested in looking more into this. Although we have changed a significant amount of the code, it has all been on the data handling, warnings, and adding additional options. So it is possible that the input handling can lead to different results, however, the code for the base algorithm of ComBat has not changed for a long time. So you should be getting the same results with equivalent input--so I am surprised its different. Would you be able to share some of your data? I would be happy to take a look and see if there is a problem. Evan On Aug 31, 2015, at 7:50 PM, Jeff Leek [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Jeff Leek<https: support.bioconductor.org="" u="" 5015=""/> wrote Answer: Reproducibility of results with ComBat<https: support.bioconductor.org="" p="" 71668="" #71679="">: Hi Anne, The code for Combat has definitely been updated since then and it is possible this is causing the change. Can you try rerunning your code without Group in the mod variable you pass to Combat? Best Jeff On Mon, Aug 31, 2015 at 3:04 PM anne.bozack [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: > Activity on a post you are following on support.bioconductor.org<http: support.bioconductor.org=""> > > User anne.bozack <https: support.bioconductor.org<http:="" support.bioconductor.org="">="" u="" 8680=""/> wrote Question: > Reproducibility of results with Comb > <https: support.bioconductor.org<http:="" support.bioconductor.org="">="" p="" 71668=""/>: > > Hi, > > My question regards working with 450k data - correcting for batch effects > and testing for significant differences in methylation between two groups. > I'm rerunning code that my colleague originally wrote and ran in 2012. > Originally, 379 sites were identified as significantly different between > groups (0.05 level of significance). However, when I run the code now, I > identify 5,711 sites. The sites originally identified are included, and > overall they are the more significant sites - most are included in the top > 10% of significant sites. I am using the current version of R and > bioconductor, but the code was originally run on an older version (likely > 2.10). Might the difference in significant sites be due to changes in the > SVA package, or packages that it relies on? > > My code is: (mscoreset is an expression set) > > pheno=pData(mscoreset) > edata=exprs(mscoreset) > mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) > mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) > batch=pheno$Batch > combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) > > pValuesComBat=f.pvalue(combat_edata,mod,mod0) > qValuesComBat=p.adjust(pValuesComBat,method="BH") > > Thanks for your help! > > Anne > > > > > ------------------------------ > > Post tags: sva, combat > > You may reply via email or visit Reproducibility of results with ComBat<https: support.bioconductor.org="" p="" 71668=""/> > ________________________________ Post tags: sva, combat You may reply via email or visit A: Reproducibility of results with ComBat

ADD REPLY • link 9.7 years ago W. Evan Johnson ▴ 870

0

Entering edit mode

Hi Evan,

Yes, I can share my data with you. Can I contact you by email once I figure out the best method to send you the data?

Thanks!

ADD REPLY • link 9.7 years ago anne.bozack • 0

0

Entering edit mode

Hi Jeff,

Thanks for your response. To clarify, do you mean running ComBat with the null model (mod0), and then testing for differences between groups (the Group variable) by running f.pvalue and p.adjust as written? However, the documentation for SVA states that the model passed used for ComBat should include all variables of interest. If I do try running ComBat with the null model I only get 9 significant sites.

ADD REPLY • link 9.7 years ago anne.bozack • 0