Entering edit mode
Michael Salbaum
▴
80
@michael-salbaum-5309
Last seen 10.2 years ago
I'm not a statistician, but interested in the question.
Wouldn't it be warranted to ask the question how 'membership' of DE
genes in the stem cell gene set would compare to 'membership' in a
randomly drawn gene set of the same size as the stem cell set?
Fisher's Exact test to evaluate, then bootstrap?
J. Michael Salbaum, Ph.D.
Associate Professor
Pennington Biomedical Research Center
Louisiana State University System
6400 Perkins Road
Baton Rouge, LA 70808
(225) 763-2782
-----Original Message-----
From: bioconductor-bounces@r-project.org on behalf of Aliaksei Holik
Sent: Wed 8/15/2012 8:51 AM
Cc: bioconductor@r-project.org
Subject: [BioC] Gene enrichment question
Dear listers,
Apologies if my question is not strictly related to Bioconductor,
though
one never knows, maybe there's a package that does what I need.
I am analysing a list of differentially expressed genes from an
Illumina
microarray. In particular I'm trying to compare the list of
differentially expressed genes to an existing list of genes
preferentially expressed in the stem cell population (stem cell
signature). When I do so, 10% of DE genes belong to the stem cell
signature. What I'd like to do now is to find out, how likely that
would
happen by chance, i.e. put a p value on it.
At the moment I know:
There're 17119 unique genes in my dataset.
Of them 86 are differentially expressed.
The stem cell signature contains 510 genes.
It is combined from several platforms, which makes it hard to
establish
the total number of unique genes, but it's at least 20819 (the size of
the largest platform).
There are 9 overlapping genes between DE genes and the stem cell
signature.
So I wonder:
1) If there's an accepted way to calculate a p value using these data.
For instance could I run a like of a chi squared test? E.g. stem cell
specific genes represent 510/20819=2.4% of total dataset. So expected
number of the stem cell genes in my DE genes would be 86x2.4%=2. So my
chi squared test would be based on 9 observed vs 2 expected.
2) Or do I have to generate a geneset based on the stem cell signature
and go through GSEA algorithms to calculate enrichment and
significance.
Any pointers in the right direction would be much appreciated.
Many thanks for your time and help!
Aliaksei.
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]