Hi,
I am trying to determine whether batch correction is necessary before performing GSVA.
I have a bulk RNA-seq dataset that includes two batches due to library preparation performed on separate occasions. My conditions of interest are present in both batches. For PCA, I applied batch correction using the removeBatchEffect
function from limma. Should I use this log-transformed, normalized, batch-corrected expression matrix as input for GSVA? Alternatively, is it possible to run GSVA on the log-transformed normalized expression matrix before batch correction and then include the batch variable in the differential expression model? Or does the ranking system used by GSVA internally account for batch effects?
Additionally, could you provide any recommendations on which differential expression analysis tool to use (e.g. limma, DESeq2, edgeR)? Do these tools perform similarly in this context?
Many thanks for your advice.
Best wishes,
Lucy
Great, thank you. Just to clarify - do you recommend against using batch-corrected counts as input for GSVA, or is this a viable alternative?
It depends on what you want to do downstream of GSVA. Batch effect correction is not perfect and once you give batch-corrected values as input for GSVA, the uncertainty associated with that correction will not be picked up by GSVA and any other tool downstream. If that tool is going to make an inferential task such as differential expression, the associated p-values are likely to loose the control of the type-I error. You can find a thorough explanation about this phenomenon in this thread.
Great, thank you!