Correct the batch effect in methylation analysis in R
1
0
Entering edit mode
kyj2226 • 0
@909e7f9c
Last seen 2 days ago
South Korea

Dear bioconductor community,

I have recently analyzed two EPIC methylation array datasets using the same pipeline in R. Both of the datasets have the same variables and the comparison was PET Amyloid negative vs PET amyloid positive samples in order to find differentially methylated CpG sites. Both of the datasets are from EPIC array, but different batch(batch 0 and batch 1).

I've already tried to correct the batch effect using Harman batch correction


shifted_betas <- shiftBetas(betas=getBeta(mSetFunnormFlt), shiftBy=1e-4)
#mSetFunnormFlt is GenomicRatioSet which is the result of probe QC and Funnorm normalization
shifted_ms <- beta2m(shifted_betas)
plot(density(shifted_ms, 0.05), main="Shifted M-values, shiftBy = 1e-4",
     cex.main=0.7)
shifted_ms
methHarman <- harman(shifted_ms, expt=targets_pr$Sample_Group,
                     batch=targets_pr$batch, limit=0.65)
#Sample_Group was divided into P and N (PET amyloid positive and Negative)

ms_hm <- reconstructData(methHarman)

fit <- lmFit(ms_hm, design)

# design matrix was created by using the code below
# design <- model.matrix(~0+Sample_Group+Age+Sex+Center+Smoking+Bcell+CD4T+CD8T+Mono+NK+Neu, data=targets_pr_0)

After batch correction, I could not get any significant differentially methylated CpG site (PET Amyloid positive vs PET Amyloid negative)

When I did methylation analysis with same pipeline separately (batch0 and batch1), I found 2,756 significant DMCs from batch 0 but 0 significant DMCs from batch 1.

I thought that case-control unbalance may cause something wrong.

My question is whether it is possible and applicable in someway to correct the batch effect in unbalanced case-control dataset?

Thank you in advance.

Best,

Yujin Kim.

enter image description here

BatchEffect limma methylationArrayAnalysis • 1.1k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

You should not apply batch correction functions like harman before a limma analysis. Instead, just add batch to the limma linear model.

The unbalanced batches do not cause any problem for limma except that the power to detect changes will be lower for batch1.

In this case, it sounds as if the data for batch1 behaves differently from batch0. Perhaps you should use the analysis of batch0 only?

ADD COMMENT
0
Entering edit mode

Thank you for your response Gordon :)

I've already run the code below, just add batch to the limma linear model

design <- model.matrix(~0+Sample_Group+Batch+Age+Sex+Center+Smoking+Bcell+CD4T+CD8T+Mono+NK+Neu, data=targets_pr)
#tagets_pr has samples which from both batch1 and batch0
fit <- lmFit(mValsq, design)

It results in 1 significant DMC.

What do you think about using removeBatchEffect in limma in my pipeline?

Is it okay to apply removeBatchEffect before a limma analysis(limma linear model)?

Sorry for basic question, I am new to limma analysis.

Are there any useful documents I can study related to limma?

I already read the limma user's guide but there was no explanation provided regarding the batch effect.(removeBatchEffect)


I agree with your words

it sounds as if the data for batch1 behaves differently from batch0.

I need to concern that I should use the analysis of batch 0 only.

Thanks a lot Gordon!

Best,

Yujin Kim

ADD REPLY
0
Entering edit mode

Is it okay to apply removeBatchEffect before a limma analysis(limma linear model)?

No, you should not apply any batch correction before a limma differential analysis. The removeBatchEffect() help page that you would see by typing ?removeBatchEffect advises you it should not be used in a differential analysis.

Are there any useful documents I can study related to limma?

The User's Guide and the help page for each function.

I already read the limma user's guide but there was no explanation provided regarding the batch effect.(removeBatchEffect)

removeBatchEffect is not mentioned because it is not recommended as part of a differential analysis. Batch correction is instead done as part of the linear model, same as for any blocking variable.

PS. Just to clarify, we have seen good results from RUV normalization and background correction for large datasets. However, when the datasets is of moderate size and the batch factor is known, the correction is better done as part of the linear model.

ADD REPLY
0
Entering edit mode

Thank you for your kind response Gordon! :)

Have a good time at the end of the year!

Best,

Yujin Kim

ADD REPLY

Login before adding your answer.

Traffic: 506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6