Question

Should I perform batch correction of normalized RNA-seq counts prior to GSVA?

0

Entering edit mode

Lucy ▴ 60

@lucy-17014

Last seen 8 months ago

United Kingdom

Hi,

I am trying to determine whether batch correction is necessary before performing GSVA.

I have a bulk RNA-seq dataset that includes two batches due to library preparation performed on separate occasions. My conditions of interest are present in both batches. For PCA, I applied batch correction using the removeBatchEffect function from limma. Should I use this log-transformed, normalized, batch-corrected expression matrix as input for GSVA? Alternatively, is it possible to run GSVA on the log-transformed normalized expression matrix before batch correction and then include the batch variable in the differential expression model? Or does the ranking system used by GSVA internally account for batch effects?

Additionally, could you provide any recommendations on which differential expression analysis tool to use (e.g. limma, DESeq2, edgeR)? Do these tools perform similarly in this context?

Many thanks for your advice.

Best wishes,
Lucy

GSVA GSVAdata RNAseq • 2.0k views

ADD COMMENT • link 9 months ago Lucy ▴ 60

score 0 · Answer 1 · 2024-07-12

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 14 days ago

Barcelona/Universitat Pompeu Fabra

Hi Lucy,

Sorry for the delay in getting back to you. GSVA does not do anything specifically to deal with batch effects, and they may affect the output of GSVA. What I would recommend you to do, is to input normalized and log-transformed expression values to GSVA and then explore through PCA or MDS plots the extent to what the batch effect you observed at gene level is also affecting GSVA enrichment scores at pathway level.

Depending on whether the batch effect is present at pathway level, and on what you want to do with the GSVA enrichment scores, you will have to take a decision about what to do next. For instance, if you want to do a differential expression analysis at pathway level (see section 6.2 of the GSVA vignette), and the batch effect is present, you could use limma and include the batch indicator variable in your design matrix to adjust for it. As illustrated in section 6.2 of the GSVA vignette, one advantage of using limma, is that you can use limma-trend to exploit the fact that GSVA enrichment scores have higher precision for larger gene sets.

Cheers,

robert.

ADD COMMENT • link 9 months ago Robert Castelo ★ 3.4k

0

Entering edit mode

Great, thank you. Just to clarify - do you recommend against using batch-corrected counts as input for GSVA, or is this a viable alternative?

ADD REPLY • link 9 months ago Lucy ▴ 60

1

Entering edit mode

It depends on what you want to do downstream of GSVA. Batch effect correction is not perfect and once you give batch-corrected values as input for GSVA, the uncertainty associated with that correction will not be picked up by GSVA and any other tool downstream. If that tool is going to make an inferential task such as differential expression, the associated p-values are likely to loose the control of the type-I error. You can find a thorough explanation about this phenomenon in this thread.