Customized GO enrichment with GOstat
1
0
Entering edit mode
doodlehzq • 0
@doodlehzq-7341
Last seen 9.8 years ago
China

Hi,

I’m using GOstats. I want to do GO enrichment with a customized background for unannotated genomes.

I followed the PDF manual ---“How To Use GOstats and Category to do Hypergeometric testing with unsupported model organisms”.

I got the following error in hyperGTest():

 “getUniverseHelper(probes, datPkg, entrezIds) :

  After filtering, there are no valid IDs that can be used as the Gene universe.

  Check input values to confirm they are the same type as the central ID used by your annotation package. For chip packages, this will still mean the central GENE identifier used by the package”

>goFrameData=read.table("new_test.txt",header=T)

> head(goFrameData)

  frame.go_id frame.Evidence frame.gene_id

1  GO:0009507            ISM             1

2  GO:0016102            IDA             1

3  GO:0009117            RCA             1

4  GO:0005783            IDA             2

5  GO:0071395            IEP             2

6  GO:0043231            ISS             2

#The gene ids used to map to GO are set as “1,2….”(numeric)

>goFrame=GOFrame(goFrameData,organism="test")

>goAllFrame=GOAllFrame(goFrame)

>gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())

>universe=unique(goFrameData$frame.gene_id)

>genes=c(1:5)


>params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params",

geneSetCollection=gsc,

geneIds = genes,

universeGeneIds = universe,

ontology = "MF",

pvalueCutoff = 0.05,

conditional = FALSE,

testDirection = "over")

Over <- hyperGTest(params)

 

My understanding is that universeGeneIds and geneIds should be subset of the same gene ids in goFrameData. But it seems they are different.

Can someone help?

Look forward to your response.

Zhiqiang

 

 

GOstats • 1.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States

You will almost surely not get any significant results if you are trying to do a GO hypergeometric using just 5 genes. Is this really what you are trying to do?

In practice your genes vector should be all the genes that were significant in some test, rather than just five chosen (semi) randomly. If you are just working through the code to make sure it works in some sense, then you will be better off selecting a larger set of genes (the vignette you are following uses the first 500).


 

ADD COMMENT

Login before adding your answer.

Traffic: 882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6