Hi Robert,
I am Prathyusha Bachali, student at UNCC. I am trying to work with GSVA package. I have created my geneset collection object and now I am trying to use GSVA function by giving expressionset which is basically all positive values of genes by sample and my geneset collection object which is collection of list of genesets with geneSymbol as identifier. My gsva function is working fine, but I am not able to interpret my results or I am not sure how GSVA score has been calculated here. Because the result, I obtained after running GSVA has some negative and positive scores. What does negative score represent here?
Here is my input expressionset object. There are 1819 genes with expression values.
geneSymbol CTL1 CTL2 SLE1 SLE2
ESRRA 6.683 6.525 6.461 6.392
CAPNS1 10.591 10.047 11.52 11.86
After performing gsva on my expressionset matching with GenesetCollection object, I got the follwong output with few negative scores. I believe these are GSVA enrichment scores.
geneSet CTL1 CTL2 SLE1 SLE2
Immunescreen 0.012 -0.1264 -0.2167 -0.2767
ISG -0.032 0.02867 0.057 -0.078
Thanks in advance. Any help is much appreciated.
Thanks,
Prat
Hi Prat,
please next time to use the 'ADD COMMENT' link, which i'm using right now, to make comments, remarks and/or follow up questions, such as your two last messages in this page, to keep the answer slots only for new answers. this helps structuring conversation about a particular topic.
regarding how negative scores come out of the GSVA algorithm, i do not have anything else to add to what i already said in my first answer above, and have no time to lecture you on this subject. you can read the paper and you can look at the source code to find your way through the algorithm. if you still do not understand how it works, you should try to formulate questions about the specific parts that you do not understand. i'm sorry i can't be more helpful this time.
cheers,
robert.
Hi Robert,
Thanks much. My apologies if I would have bothered you more regarding the GSVA.
Prat
Hi Robert,
I have been using GSVA extensively for pathway centric analysis to understand the heterogenous populations and understand the pathways in each patient. This is quite a powerful program. I have a small question. It might be a simple one. As explained in the paper the input for the gsva is log 2 expression values. Here I am using the log 2 transformed values of the DE genes at FDR 0.02%. In order to be more confident about results we are limiting our input to DE genes significant at FDR 0.02%. We If I do like this do you think I am loosing the power of GSVA. Or is it good practice to use all the genes left after filtering the low variance genes, duplicate genes, genes with out entrez ids, and the control probes? We are wrapping up our paper. I would really appreciate if you give your insight on the input I am using currently.
We are using GSVA approach mainly for looking for drug molecules targeting pathways. So I was wondering using the significant DE genes as my input would be good idea. Thanks in advance.
Hi Prat,
please read carefully the Bioconductor Posting Guide, which contains guidelines on what are the best practices in using this support site. These best practices are there for the benefit of everyone, including your own. In particular, if you look at the guidelines for "Composing", the first one says "Compose a new message with a new subject line; only reply to an existing post if you are elaborating on or answering a previous question". Because you're not elaborating on a previous question, what you are asking now would better fit into a new question with an appropriate specific subject and tags. This helps building a knowledge base on the use of a package and helps finding answers to previously posted questions. I'm sure you've already benefited from this strategy, but its success depends on the proper use by every one of us.
That said, the answer to your question depends on what are you doing with the GSVA scores. How are you using them once they are calculated? (i.e., for exploratory/visualization purposes? for inferential purposes -testing of some kind? etc.)
Hi Robert,
Firstly my apologies for posting it incorrectly. I thought I would follow the same thread since it is all about GSVA. I would post it correctly next time. Since I have started already here, I am using "ADD REPLY" for now. But from next time I will make sure I am posting the question correctly.
On a bigger picture we are using GSVA scores for drug repurposing and also trying to understand the pathogenesis of Lupus auto-immune disorder. We are trying to make custom gene sets (like different cell types, genes reacting for the drug treatments, etc.,) and then trying to see how these gene sets are behaving in our expression profiling datasets. While I am using the matrix of log2 expression values of DE genes significant at FDR 0.02% as my input and the custom gene sets as our reference, and apply the "gsva" method we are seeing some interesting results. Now I am not sure if I might need to broaden my approach and take all the non significant DE genes into consideration as well. I am little confused at this point of time in order to choose which approach would be better. I am concerned that if add the non significant DE genes would there be any chance of increasing false positives in my results?
Thank you so much again for answering my questions. I will definitely make sure next time that I post correctly. I really appreciate your time.
Prat
Hi Prat,
if you are using GSVA scores for inferential purposes such as selecting gene sets that are differentially expressed, then i'd recommend to start with the whole set of genes, discard those that are lowly expressed and calculate GSVA scores over the collection of gene sets of your interest. Then, do your differential expression analysis over those GSVA scores. If you are using GSVA scores for exploratory/visualization purposes, then what you are doing using only DE genes is already fine. Since you did not answer may question before, i don't know what you mean by "increasing false positives". Regarding these messages, one should write a thread for each different question, and not for each different package. If you think some of the answers address your question, you should upvote them. This also helps guiding people to useful answers.