topGO - genesInTerm returns genes not in the annotation
0
1
Entering edit mode
@samuel-collombet-6574
Last seen 7.6 years ago
France

Hi,

I am using the topGO package and I got very strange results:

I made a go2genes list myself, downloading go annotation mapping ensembl gene id with biomart :

> mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="mmusculus_gene_ensembl")

> ensemblGene_go <- getBM(attributes=c("ensembl_gene_id","go_id","external_gene_id"),filters="ensembl_gene_id", values=ensembl$ensembl_geneID,mart=mart)
> head(ensemblGene_go )

     ensembl_gene_id      go_id external_gene_id
1 ENSMUSG00000013653 GO:0008150    1810065E05Rik
2 ENSMUSG00000013653 GO:0005575    1810065E05Rik
3 ENSMUSG00000013653 GO:0003674    1810065E05Rik
4 ENSMUSG00000058287 GO:0008150          Gm12253
5 ENSMUSG00000058287 GO:0046849          Gm12253
6 ENSMUSG00000058287 GO:0005575          Gm12253

> go2ensemblGene <- split(ensemblGene_go$ensembl_gene_id,ensemblGene_go$go_id)
> go2ensemblGene[1:2]

$`GO:0000002`
[1] "ENSMUSG00000022889" "ENSMUSG00000033845" "ENSMUSG00000030879"
[4] "ENSMUSG00000090262" "ENSMUSG00000019699" "ENSMUSG00000030557"
[7] "ENSMUSG00000027424"

$`GO:0000003`
[1] "ENSMUSG00000029061"

I then make my topGO object:

> GOdata <- new("topGOdata", ontology="BP", annot=annFUN.GO2genes, GO2genes=go2ensemblGene, allGenes=GeneList,nodeSize=5,geneSel=topClusterGenes)

Then, if I call genesInTerm() for some GO term, the mapping between genes and go term does not fit at all!

> genesInTerm(GOdata,"GO:0051053")
$`GO:0051053`
[1] "ENSMUSG00000022878" "ENSMUSG00000032633" "ENSMUSG00000036086"
[4] "ENSMUSG00000036986" "ENSMUSG00000045658" "ENSMUSG00000046323"
[7] "ENSMUSG00000046697" "ENSMUSG00000054272" "ENSMUSG00000056758"

> go2ensemblGene["GO:0051053"]
$`GO:0051053`
[1] "ENSMUSG00000026241" "ENSMUSG00000053647"

another example:

> genesInTerm(GOdata,"GO:0051055")
$`GO:0051055`
[1] "ENSMUSG00000025856" "ENSMUSG00000032715" "ENSMUSG00000033161"
[4] "ENSMUSG00000036856" "ENSMUSG00000047638"

> go2ensemblGene["GO:0051055"]
$`GO:0051055`
[1] "ENSMUSG00000041333" "ENSMUSG00000078686" "ENSMUSG00000094793"
[4] "ENSMUSG00000078675" "ENSMUSG00000078673" "ENSMUSG00000078672"

I guess I do something wrong when I create the topGO object, but I followed the vignette and my annotation seems alright...
Any idea?

topGO • 2.1k views
ADD COMMENT
0
Entering edit mode

If needed my sessionInfo:

> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] tcltk     grid      parallel  stats4    stats     graphics  grDevices
 [8] utils     datasets  methods   base     

other attached packages:
 [1] topGO_2.18.0              SparseM_1.6              
 [3] GO.db_3.0.0               RSQLite_1.0.0            
 [5] DBI_0.3.1                 AnnotationDbi_1.28.1     
 [7] graph_1.44.1              biomaRt_2.22.0           
 [9] Mfuzz_2.26.0              DynDoc_1.44.0            
[11] widgetTools_1.44.0        e1071_1.6-4              
[13] Biobase_2.26.0            wq_0.4-1                 
[15] zoo_1.7-12                reshape2_1.4.1           
[17] ggplot2_1.0.1             RColorBrewer_1.1-2       
[19] DESeq2_1.6.3              RcppArmadillo_0.4.650.1.1
[21] Rcpp_0.11.5               GenomicRanges_1.18.4     
[23] GenomeInfoDb_1.2.4        IRanges_2.0.1            
[25] S4Vectors_0.4.0           BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     annotate_1.44.0     base64enc_0.1-2    
 [4] BatchJobs_1.5       BBmisc_1.9          BiocParallel_1.0.3 
 [7] bitops_1.0-6        brew_1.0-6          checkmate_1.5.1    
[10] class_7.3-12        cluster_2.0.1       codetools_0.2-11   
[13] colorspace_1.2-6    digest_0.6.8        fail_1.2           
[16] foreach_1.4.2       foreign_0.8-63      Formula_1.2-0      
[19] genefilter_1.48.1   geneplotter_1.44.0  gtable_0.1.2       
[22] Hmisc_3.15-0        iterators_1.0.7     lattice_0.20-30    
[25] latticeExtra_0.6-26 locfit_1.5-9.1      MASS_7.3-39        
[28] munsell_0.4.2       nnet_7.3-9          plyr_1.8.1         
[31] proto_0.3-10        RCurl_1.95-4.5      rpart_4.1-9        
[34] scales_0.2.4        sendmailR_1.2-1     splines_3.1.0      
[37] stringr_0.6.2       survival_2.38-1     tkWidgets_1.44.0   
[40] tools_3.1.0         XML_3.98-1.1        xtable_1.7-4       
[43] XVector_0.6.0      
>
ADD REPLY

Login before adding your answer.

Traffic: 507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6