Different number of differentially expressed genes after using ComBat in 'sva' for batch correction
1
0
Entering edit mode
@michaela-oswald-5995
Last seen 10.5 years ago
Hi, I have a question about concerning the number of differentially expressed probes after batch combination, using ComBat from 'sva'. I have 2 data sets: one containing around 250 samples that correspond to around 50 groups, another one containing 10 samples corresponding to 2 groups (let me call them Batch2_Group1, Batch2_Group2). One of the 2 group labels in the second batch (Batch2_Group2) also exists in the first batch, so there is no confounding situation here. Before batch correction the 2 data sets cluster by batch, not by group. I used ComBat from the R/Bioconductor package 'sva' to correct for this, using a model matrix to accommodate the overlapping groups between the 2 batches and setting par.prior=TRUE, i.e. using parametric adjustment. After the batch correction the samples cluster perfectly by group, not by batch any longer. I do notice, however, that the number of differentially expressed probes between Batch2_Group1 and Batch2_Group2 changes dramatically with data combination. Within Batch2 alone I have around 1000 differentially expressed probes, around 50% up- and down-regulated each. After data combination I have around 3000 differentially expressed probes, ~2000 up and ~1000 down in the group comparison. (I use 'limma' for differential analysis). It seems that ComBat pulled the groups Batch2_Group1 and Batch2_Group2 further apart from each other. The group that did not have a group label match in Batch1 is now much more up-regulated. Is there a way to adjust the data combination so I can keep the number of differentially expressed probes similar to what it was before? Thank you, Michaela [[alternative HTML version deleted]]
• 1.7k views
ADD COMMENT
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.9 years ago
United States
There are several possibilities about why this happened, but one is power. Limma (and all ANOVA routines) uses the MSE computed from all the groups to determine differences among groups. Since Batch 2 is very small, you did not have a good measure of MSE in the analysis that included only Batch 2. When you combine samples, you have a much better measure and many more d.f. for error and so much more power. If it also happens that Batch 2 was a bit more variable than Batch 1, you will also have a smaller MSE after combining. Finally, you now have more measurements for Group2 which means that any comparison involving group 2 will be much more powerful. --Naomi Altman At 10:48 AM 6/14/2013, Michaela Oswald wrote: >Hi, > >I have a question about concerning the number of differentially expressed >probes after batch combination, using ComBat from 'sva'. > >I have 2 data sets: one containing around 250 samples that correspond to >around 50 groups, another one containing 10 samples corresponding to 2 >groups (let me call them Batch2_Group1, Batch2_Group2). One of the 2 group >labels in the second batch (Batch2_Group2) also exists in the first batch, >so there is no confounding situation here. > >Before batch correction the 2 data sets cluster by batch, not by group. > >I used ComBat from the R/Bioconductor package 'sva' to correct for this, >using a model matrix to accommodate the overlapping groups between the 2 >batches and setting par.prior=TRUE, i.e. using parametric adjustment. >After the batch correction the samples cluster perfectly by group, not by >batch any longer. > >I do notice, however, that the number of differentially expressed probes >between Batch2_Group1 and Batch2_Group2 changes dramatically with data >combination. Within Batch2 alone I have around 1000 differentially >expressed probes, around 50% up- and down-regulated each. After data >combination I have around 3000 differentially expressed probes, ~2000 up >and ~1000 down in the group comparison. (I use 'limma' for differential >analysis). > >It seems that ComBat pulled the groups Batch2_Group1 and Batch2_Group2 >further apart from each other. The group that did not have a group label >match in Batch1 is now much more up-regulated. > >Is there a way to adjust the data combination so I can keep the number of >differentially expressed probes similar to what it was before? > >Thank you, >Michaela > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6