Entering edit mode
>Message: 6
Hi ,
Based on a selection of gene ID , to find the overrepresentation of
pathway,?
we could use: 1.?find_enriched_pathway function (KEGG profile) or 2.
spia ( SPIA package) where PNDE gives an overrepresentation. These
functions works very well. However (Based on a same ?selection of gene
ID!),
?I get some differents results.
If I compare the top ten list, I have only one pathwayID. I expected a
similar result with a little potential differents!
A. If I look inside the function to compute the PVALUE ?the function
seems the same :?
KEGG profile :?
pvalue[x] <- phyper(kegg_result_length[x],
keggpathway2gene_length[x],?
? ? ? ? ? ??length(unique(unlist(keggpathway2gene))) -
kegg_result_length[x],?
? ? ? ? ? ? length(unique(unlist(kegg_result))), lower.tail = F)
And SPIA:
ph[i] <- phyper(q = noMy - 1, m = pSize[i], n = length(all) -?
? ? ? ? ? ? ? ? pSize[i], k = length(de), lower.tail = FALSE)?
?
HENCE, the compute of pvalues seems the same.
B. The compute of pvalues seems the same ! Not really : the reference
of compute the overepresentation .
KEGG profile:?
the reference is based on?keggpathway2gene
And SPIA:
the reference is based on "all" . All is all id ?present on the chips.
In my case ( Illumina HT6 v2 , this chips is considered as
pangenomic.?
HENCE, the reference muste be the same in this case.
My question.
?In your opinion,
Why this MAJOR difference between these both methods?
Actually, I offer the both results ?but I need to justify the
difference.?
If the authors of these methods ( or others) could be given me some
explications or explain to me where I'm wrong , I will appreciate that
!
Greg Montr?al
<
Hi Greg,
There may be a few reasons why you see those differences between SPIA
pNDE ranking and KEGGprofile (which I am not familiar with):
Firstly, the data base of pathways may be different between the two.
You can do a blank test with de argument de in spia including all the
genes in the "all" argument and something similar with KEGGprofile.
You will see then if the same list of pathway having the same number
of recognized gene IDs are used by the two packages.
Another reason can be the fact that in spia the pathway size (m =
pSize[i]) is given by the list of genes in the pathway that are
present on the array (included in all) while this may not be the case
with KEGGprofile.
Adi Tarca