Question

GSVA lots of positive values, is this expected behavior?

0

Entering edit mode

owen.whitley ▴ 10

@owenwhitley-15693

Last seen 6.0 years ago

Hi,

I'm re-analyzing data from Yuan et al. 2018 (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0567-9) with 8 high grade glioma samples. For a particular sample, I log normalize data using the method of Lun et al. (2016) and run GSVA on a subset of cells (putative cancer cells) using some gene signatures, and in for one gene signature, I see what appear to be consistently positive values. Here's the plot:

As you can see, for the gene set 'RNA.GSC.c2' (which is composed of about 1200 genes out of 4800 used in this analysis), we have very few samples below 0. Since GSVA's rank based score is deals with genes ranked by relative expression in a dataset, I was a bit surprised by this result. Do you think this could be due to the existence of outliers with extremely low log counts?

Here are the sample means for the gene set

Thanks

gsva • 1.2k views

ADD COMMENT • link updated 6.0 years ago by Robert Castelo ★ 3.4k • written 6.0 years ago by owen.whitley ▴ 10

score 0 · Answer 1 · 2019-04-10

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 12 days ago

Barcelona/Universitat Pompeu Fabra

Hi,

I would definitely remove lowly-expressed genes prior to running GSVA, just as you would do with differential expression. The fact that a gene set has consistently positive scores across samples means to me that its constituent genes are highly ranked in expression values across samples.

cheers,

robert.

ADD COMMENT • link 6.0 years ago Robert Castelo ★ 3.4k