Question

Evaluation of diagnostic plot resulted from ComBat function regarding batch effect correction of a merged microarray dataset

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 21 days ago

Germany/Heidelberg/German Cancer Resear…

Dear ALL,

in conjuction with one previous post about the correct implementation of ComBat() function for batch effect correction, [Appropriate implementation of ComBat function for known batch effect correction and alternative methodologies of merging microarray datasets]

i would like to ask for the interpretation of the resulting plot regarding the parametric approach, and how could i investigate it for my results. The link to the plot is below:

https://www.dropbox.com/s/k7v8ttbq27pk1xh/prior.plots.COMBAT.png?dl=0

So, i understand that the black line represents the kernel estimate of the empirical batch effect density, and the red the parametric estimate, but why there are two lines of plot ? In other words each line of two plots what represents ?

Moreover, regarding the evaluation of the plot, mostly on the density plots, i could consider the parametric adjustment efficient or not ?

Finally, before implementing Combat, a standardization of my merged microarray dataset, could be considered beneficial for the parametric approach ?

Please excuse me for any naive questions, but i have no experience with previous diagnostic plots regarding ComBat, and any feedback is highly appreciated !!!

ComBat diagnostic plot batch effect affymetrix microarrays sva • 2.3k views

ADD COMMENT • link updated 9.2 years ago by W. Evan Johnson ▴ 870 • written 9.2 years ago by svlachavas ▴ 840

score 1 · Answer 1 · 2016-02-05

Yes you are correct, the red (left plots) are the parametric estimates and the black lines are the kernel estimates for the distribution of effects across genes. The right is a Q-Q plot with the parametric estimate (red line) and the actual ordered batch effects for each gene (black points). The top plots are for the means, and the bottom are for the variances.

For your case, I think you are fine using the parametric version of ComBat. Although there is some deviation (especially in the variances) in your case, the kernel and parametric versions will produce highly similar results. What you are really looking for here are extreme differences, say severe skewness or bimodality in the kernel that the parametric can't pick up. In your case I would posit that you will see differences less than 1-3% in the final adjusted data--which is unlikely to have any effect on your downstream analyses.

Does this answer your questions?