Normalized counts after fitting linear model with batch effects - edgeR
1
0
Entering edit mode
Yahan • 0
@yahan-14837
Last seen 15 months ago
United States

Hi,

I'm dealing with a set of sequencing data with batch effects. The samples were sequenced at two different times (4 control + 4 treatment1 at the first time, and 1 control + 4 treatment2 at the second time). The batch effects are very obvious when I looked at the PCA plots of raw data. I used RUVSeq and edgeR and fitted the linear model with batch effects included in the design. The results are OK, but I cannot find a way to look at the counts without batch effects. Even the counts in fit$fitted.values are still with batch effects.

So, I'm wondering is it possible to get the counts without batch effects after linear model fitting? I need these counts for making heatmap. I found someone said this in an old post - getting a matrix of batch corrected counts is not possible. If the counts do not exist, how are logFC and logCPM calculated (they are batch effects free in my results)?

Or this can be done by other packages like DESeq?

Or I should use removeBatchEffect function just for making heatmap?

Thank you!

Yahan

edger batch effect normalized counts • 1.8k views
ADD COMMENT
0
Entering edit mode

This is not an answer to your question, but your design is almost completely confounded, since the second batch only has a single control sample. This means that the entire batch correction hinges on that single sample, and any noise in that sample will be interpreted as a batch effect to be subtracted out of all other samples.

ADD REPLY
0
Entering edit mode

True. I noticed this huge batch effects problem after I got the data. I'm also concerned about the design.

ADD REPLY
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 weeks ago
Icahn School of Medicine at Mount Sinai…

Getting a matrix of batch-corrected counts is not really possible, because the resulting matrix would not represent actual counts. However, getting a matrix of batch-corrected logCPM values is certainly possible, using removeBatchEffect. Note that this kind of batch subtraction will likely change the row means.

The logFC value in the results is the value of the coefficient or contrast that was tested, while the logCPM value in the results is simply the average logCPM of all samples, which is not affected by batch effects.

When building a heatmap using batch-corrected values, always be wary of confirmation bias. You can almost always get a good-looking heatmap if you subtract out all the variation that doesn't fit your model and plot the remaining variation, regardless of whether there is genuine differential expression.

ADD COMMENT
0
Entering edit mode

Thank you Ryan! I've seen the huge change on library size and cpm numbers made by removeBatchEffect. I made a mistake and you are right on the logCPM.

ADD REPLY

Login before adding your answer.

Traffic: 693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6