Hello everyone
Regarding the package singscore, in particular the function simpleScore
, that allows to score a gene expression dataset based on one or two gene sets, I was wondering if the signature scores from different gene sets are comparable.
Say I have 4 gene sets that I use to classify tumor samples to molecular subtypes, my idea is to score the gene expression dataset with each one of the 4 gene sets separately, and then compare the signature scores across the 4 gene sets for each sample.
I would like to know if the scores are comparable in absolute terms.
Here an example output (note that I run each gene set separately and then merged the results)
id Gene_set_1 Gene_set_2 Gene_set_3 Gene_set_4
sample_1 -0.0625 -0.194 0.298 0.182
sample_2 -0.0706 -0.211 0.273 0.218
sample_3 0.0366 -0.204 0.183 0.263
sample_4 -0.0219 -0.221 0.325 0.215
sample_5 -0.0215 -0.232 0.267 0.2
sample_6 -0.00629 -0.186 0.205 0.255
sample_7 -0.0425 -0.202 0.177 0.217
sample_8 -0.0985 -0.219 0.252 0.191
sample_9 -0.0726 -0.194 0.272 0.154
sample_10 -0.0513 -0.226 0.245 0.161
Can I say for example that for sample_1 the gene sets scores ranked are: Gene_set_3 > Gene_set_4 > Gene_set_1 > Gene_set_2 ?
Thanks
PS: cross-posted to biostars
Hi Pietro,
Any luck in finding the answer ?
Hi,
I apologise for not having answered this question. I am the current maintainer of singscore and have just started getting notifications for questions regarding the package. To answer the previous question, singscores can be compared between genesets, however, this depends on the context.
The problem Pietro referred to in his question would require standardisation of scores across samples to ensure the dynamic range of scores for each geneset is the same. This is the case with any transcriptomic analysis, whereby, when comparing expression values across samples, standardisation is required. Genes have different dynamic ranges therefore when comparing two genes for the purpose of subtyping, it may be useful to ensure the dynamic ranges are comparable. You could either assume that the dynamic range is equivalent or you could normalise the expression (e.g. using a z-transformation).
Likewise, singscores provide a quantification of the absolute expression of genes in a geneset relative to other genes for any given sample. A perfect positive score indicates that genes in the geneset all have the highest expression within the sample. To compare scores between genesets for the purpose of subtyping, you would need to ensure that the dynamic range of scores is equivalent therefore you would need to normalise scores. If you are interested in a comparison of absolute scores, you could use the scores as they are and it should be fine.
I hope this helps you with your analysis and please do not hesitate to ask for further help.
Cheers, Dharmesh