Hi Robert,
That's a clear and neat explanation. Now I believe that if i have my gene identifiers same as my expression set I can make a geneset object.
Let me explain my scenario clearly. I have three phenotypes like "active patients", "inactive patients" and "control patients" nothing but healthy individuals. In one dataset I know my phenotypes (cohorts) as active patients and control patients. Now I have done microarray data analysis and found differentially expressed genes among "Active patients vs. Control patients". I have my differentially expressed genes with their probe_ids, p value and FDR corrected p value and Log fold change value for it as my output of my first dataset. The second dataset, which I have, have patient samples belonging to control samples and either "Active patients" or "Inactive patients". We do not know whether or not patient samples are active or inactive. I am assuming that GSVA can identify unknown phenotypes here.
My plan is to compare the expression set created using control samples and unknown phenotype (i.e active or inactive patient samples) with the differentially expressed genes identified, between "Control samples vs. Known phenotype (i.e Active patient samples)". My first question here is 1. Can I make the geneset object using the differentially expressed genes I have identified in the first dataset (with known phenotypes). From your previous answer, I believe that I can make geneset object using my output of my first dataset by matching the gene identifiers of my first data output with the gene identifiers of my expression set (built using control samples and unknown phenotype). Then I use gsva function and it creates the matrix of geneset enrichment scores. Here, I am unable to interpret my output after applying the gsva function on my eset with geneset object. Here is the sample one I have made using your previous example,
geneids <- sample(featureNames(leukemia_eset), size = 100, replace=FALSE)
res <- gsva(leukemia_eset, list(GS1=geneids))
summary(res)
Length Class Mode
es.obs 1 ExpressionSet S4
bootstrap 2 -none- list
p.vals.sign 0 -none- NULL
Now how can I use res to find my unknow phenotypes? Is res is a matrix which has geneset enrichment scores giving some ranks to the genes?
Is it possible to use res object to generate heatmap so that I can see how genes are expressed in unknown samples compared to the known samples?
I know I have asked many questions. I am really having hard time in figuring out this issue. I cannot move forward in my reserach until I figure out this step. Any suggestions/ideas are much appreciated.
Thanks,
Prat