I am using limma on GSVA scores to assess differential expression of gene sets (in microarray and RNAseq data). Since GSVA scores can be negative, I am wondering how limma calculates the fold changes between a negative and a positive GSVA score and how meaningful a fold change cutoff would be to define differential expression of gene sets (in addition to applying a p.value cutoff).
I am familiar with using limma on gene level data and how the fold changes are derived from log2 intensities in the case of microarray data. However, extrapolating this formula onto the scenario of negative and positive GSVA scores doesn't seem to provide any meaningful result, since they are not log transformed values to start with. I am wondering whether the GSVA scores require some kind of pre-processing before being passed onto limma, although in the GSVA vignette, scores appear to be passed onto limma without any pre-processing, as far as I can see? However, in the GSVA vignette, no fold change cutoff is applied to define differential expression for gene sets, while a fold change cutoff is applied on gene level data.
Thank you - that confirms my thoughts on the topic. However, even though we can't really express the difference between two GSVA scores in terms of a 'traditional' fold-change, one may still like to use some measure that describes the extend/magnitude of change between two scores (e.g. the difference a-b) to further rank or shortlist gene sets that are defined as 'differentially expressed' based on a p.value cutoff!?
Yes, note that while the magnitude of change may be tricky to interpret, the p-value has a very precise interpretation because rejecting the null hypothesis will allow you to say something about the association between the gene set and the explanatory variable for which the null hypothesis of its coefficient was rejected, after correcting for multiple testing. The ranking of by the magnitude of change among those gene sets that meet some multiple testing correction, should also be meaningful.