bioMart GO inconsistency, normal?
0
0
Entering edit mode
R Tagett ▴ 30
@r-tagett-5272
Last seen 10.3 years ago
Hello, I am a graduate student at Wayne State University in Detroit. I am running BioMart to collect GO terms for lists of genes and I noticed that some genes are annotated with top nodes (eg "biological_process") and others are not. I wonder if any one can tell me why. An example code and my sessionInfo are below. In this example, I collect all human HUGO gene symbols using the HGNChelper package. From those, I use BioMart to get the GO terms for these genes, and take only the "biological_process" (BP) annotations. There are 15368 unique genes that have BP annotations (uniqGenesInGO). Then , I split the list of all BP annotations into those which include "GO:0008150" (which is the "biological_process" term), and those which do not. 596 genes are annotated with "GO:0008150", and 14772 are not. This is inconsistent! Thanks for your help, Becky library("biomaRt") ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") library(HGNChelper) data(hgnc.table) # gives names of all approved symbols allHgnc <- unique(hgnc.table[,2]) allGOhgnc <- getBM(attributes = c("go_id", "go_linkage_type", "entrezgene","hgnc_symbol","namespace_1003"), filters = "hgnc_symbol", values = allHgnc, mart = ensembl) # load("allGOhgnc.RData") 266749 BP <- allGOhgnc[which(allGOhgnc$namespace_1003 == "biological_process"),] # 126789 human BP terms in GO BP<-BP[!duplicated(BP),] # 108019 uniqGenesInGO <- unique(BP$hgnc_symbol) # 15368 # "GO:0008150" is "biological_process" hasBPtab <- BP[which(BP$go_id == "GO:0008150"), ] hasBP<- unique(hasBPtab$hgnc_symbol) length(hasBP) # 596 noBPtab<-BP[ -which(BP$hgnc_symbol %in% hasBP), ] length(unique(noBPtab$hgnc_symbol)) # 14772 # 14772 + 596 = 15368 # why are some genes annotated with the top node and others are not?? > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.16.0 loaded via a namespace (and not attached): [1] RCurl_1.95-4.1 XML_3.96-1.1
GO biomaRt GO biomaRt • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6