Using both voom and removeBatchEffect, will they double adjust the data or is it the right way to perform this
1
0
Entering edit mode
sukeshinik5 ▴ 20
@8cebf978
Last seen 3 months ago
India

Hello,

In RNA sequencing data, I want to adjust for the covariates such as age, BMI, sex etc. I have created colData with these covariates and factorised (for binary) and scaled (for continuous) them. My concern is I have normalised the data and then applying adjustment. However, during adjustment using both voom and removeBatchEffect, will they double adjust the data or is it the right way to perform this. Please see below the functions used.

design_matrix <- model.matrix(~Age + Gender + BMI + 
                                Batch + Condition, data=colData)

v <- voom(raw_counts, design_matrix, plot=TRUE)

fit <- lmFit(v, design_matrix)

adjusted_counts <- removeBatchEffect(v$E, covariates=phenodata[, c("Age", "BMI", "Batch")]).

This is because I wish to use this matrix for correlation studies and not for differential gene expression analysis.

Is there any other way to adjust for covariates?

Thank you,

limma adjustment voom removeBatchEffect covariates • 1.2k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 1 day ago
WEHI, Melbourne, Australia

The primary purpose of the voom() function is to compute precision weights for the downstream differential expression analysis. vooma() also does some normalization but the quantities that it produced are designed to work in conjuction with the weights. It is not intended as a normalization function for other purposes. Since you are not planning to use the voom precision weights in your downstream analysis, there is no purpose in using the voom() function at all.

If you want to export batch-corrected quantities from limma and edgeR, and the purpose is to preserve the effects of Condition, we recommend:

logCPM <- cpm(raw_counts, log=TRUE)
design.Covariates <- model.matrix(~Age+Gender+BMI+Batch)
logCPM.adjusted <- removeBatchEffect(logCPM, group=Condition, covariates=design.Covariates[,-1])

Note that the call to removeBatchEffect() in your question is not correct because you are passing a non-numeric matrix to the covariates argument.

The reason by cpm() with log=TRUE is better than voom() for your purpose is that it applies a larger prior count to the data, damping down the variability of the low log-count values and making the resulting logCPM values more nearly homoscedastic. By contrast, the logCPM values from voom() have a strong mean-variance relationship, which is fine in a limma analysis because it is taken care of by the precision weights.

BTW, I am the author of the voom() and removeBatchEffect() functions.

ADD COMMENT
0
Entering edit mode

Thank you so much for the response and for the development of such crucial functions: voom() and removeBatchEffect().

I have factorized the covariates so they remain in numeric form and then using them into design matrix. Something like the following,

colData <- data.frame(Condition = phenodata$Status,
                      SampleName = rownames(phenodata), 
                      Age = phenodata$Age, 
                      Gender= as.factor(phenodata$Gender), 
                      BMI = phenodata$BMI, 
                      Batch = phenodata$Batch)

factorize variables to keep it numeric

colData$Gender <- as.factor(colData$Gender)
label_mapping<-c("Female" = 1, "Male" = 2)
colData$Gender <- label_mapping[colData$Gender] # 1=Female 2=Male
colData$Condition <- as.factor(colData$Condition)
colData$Condition <- relevel(colData$Condition, ref = "HC")
colData$Batch <- as.factor(colData$Batch)
Batch_mapping<- c("Batch_1" =1, "Batch_2" = 2, "Batch_3"= 3)
colData$Batch <- Batch_mapping[colData$Batch] # 1, 2, 3

I am not sure if age and BMI should be scaled or factorized.

ADD REPLY
1
Entering edit mode

Ah, no, that's not right. You have misinterpreted my comment about passing a non-numeric matrix to the covariate argument.

You don't need to recode or scale any of the variables. Just use the code exactly as in my answer with the variables as they already are in your phenodata or colData data.frames. The model.matrix function will convert the factors into a numeric matrix in the correct way. You do not need to recode the factors to numeric yourself, nor should you do so.

ADD REPLY
0
Entering edit mode

Sure, thank you so much, will impliment codes as you suggested.

ADD REPLY

Login before adding your answer.

Traffic: 458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6