Goseq with small numbers of genes: minimum number?
0
0
Entering edit mode
matt.arno • 0
@mattarno-8491
Last seen 9.3 years ago
United Kingdom

Hi - I have some relatively small genes lists (around 10-20 significant genes (padj<0.05), and tried goseq to look for over represented GO terms and KEGG pathways. I also did the 'sampling' method as a negative control but this gave very similar results to the real test (similar pvalues and terms):

> head(GO.samp.MF.LowvNon) # this is the sampling method control 
       category over_represented_pvalue under_represented_pvalue numDEInCat numInCat
1769 GO:0016362             0.001998002                        1          1        2
2627 GO:0034711             0.001998002                        1          1        3
1377 GO:0008466             0.003996004                        1          1        1
2009 GO:0017002             0.003996004                        1          1        7
2762 GO:0038023             0.003996004                        1          4      579
3172 GO:0048185             0.005994006                        1          1       11
                                        term ontology
1769      activin receptor activity, type II       MF
2627                         inhibin binding       MF
1377 glycogenin glucosyltransferase activity       MF
2009     activin-activated receptor activity       MF
2762             signaling receptor activity       MF
3172                         activin binding       MF

> head(GO.MF.LowvNon) # this is the real test
       category over_represented_pvalue under_represented_pvalue numDEInCat numInCat
1377 GO:0008466             0.001197658                1.0000000          1        1
1769 GO:0016362             0.002340668                0.9999987          1        2
2627 GO:0034711             0.003514714                0.9999962          1        3
18   GO:0000155             0.003516708                0.9999962          1        3
2762 GO:0038023             0.003856110                0.9996154          4      579
730  GO:0004673             0.004728336                0.9999922          1        4
                                        term ontology
1377 glycogenin glucosyltransferase activity       MF
1769      activin receptor activity, type II       MF
2627                         inhibin binding       MF
18       phosphorelay sensor kinase activity       MF
2762             signaling receptor activity       MF
730        protein histidine kinase activity       MF

 

my question is this: is this likely to be due to putting too few genes into the analysis?

I think my code is OK, as I've done this before with larger lists and got some good pvalues for the real test and sampling pvalues were close to 1.

Cheers for any insight.

matt

 

goseq • 1.6k views
ADD COMMENT
0
Entering edit mode

...I think I've got the wrong end of the stick with this: the method=sampling means not using the Wallenius method for the null distribution. For some reason I thought this was a background analysis or negative control to compare the real thing to...

...it must be getting late...

matt

 

 

ADD REPLY

Login before adding your answer.

Traffic: 698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6