Hello,
I'm having trouble finding GO terms and definitions for a list of genes using biomaRt. The problem seems to be specific to some genes instead of all. For exmaple,
> library("tibble")
> library("biomaRt")
> BM = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
Below is a successful query.
> tibble(getBM(attributes = c("external_gene_name", "definition_1006"), filters = "external_gene_name", values = "RUNX1", mart = BM))
# A tibble: 39 x 1
`getBM(...)`$external_ge… $definition_1006
<chr> <chr>
1 RUNX1 Any molecular function by which a gene product int…
2 RUNX1 Any process that modulates the frequency, rate or …
3 RUNX1 A membrane-bounded organelle of eukaryotic cells i…
4 RUNX1 Interacting selectively and non-covalently with AT…
5 RUNX1 A protein or a member of a complex that interacts …
6 RUNX1 Any process that activates or increases the freque…
7 RUNX1 Organized structure of distinctive morphology and …
8 RUNX1 The part of the cytoplasm that does not contain or…
9 RUNX1 That part of the nuclear content other than the ch…
10 RUNX1 Interacting selectively and non-covalently with an…
# … with 29 more rows
However, when I change RUNX1 to BCOR, things start to fail,
> tibble(getBM(attributes = c("external_gene_name", "definition_1006"), filters = "external_gene_name", values = "BCOR", mart = BM))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 22 did not have 2 elements
But I can confirm BCOR is a valid gene symbol,
> tibble(getBM(attributes = c("external_gene_name", "ensembl_gene_id", "go_id"), filters = "external_gene_name", values = "BCOR", mart = BM))
# A tibble: 24 x 1
`getBM(...)`$external_gene_name $ensembl_gene_id $go_id
<chr> <chr> <chr>
1 BCOR ENSG00000183337 GO:0005515
2 BCOR ENSG00000183337 GO:0005634
3 BCOR ENSG00000183337 GO:0006325
4 BCOR ENSG00000183337 GO:0004842
5 BCOR ENSG00000183337 GO:0000122
6 BCOR ENSG00000183337 GO:0003714
7 BCOR ENSG00000183337 GO:0007507
8 BCOR ENSG00000183337 GO:0008134
9 BCOR ENSG00000183337 GO:0044212
10 BCOR ENSG00000183337 GO:0045892
# … with 14 more rows
It appears that definition_1006 just does not work for BCOR. This baffles me. Does anybody know what went wrong here? Thanks.
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /usr/local/R/3.5.2/lib/R/lib/libRblas.dylib
LAPACK: /usr/local/R/3.5.2/lib/R/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_2.1.1 biomaRt_2.38.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 AnnotationDbi_1.44.0 magrittr_1.5
[4] BiocGenerics_0.28.0 hms_0.4.2 progress_1.2.0
[7] IRanges_2.16.0 bit_1.1-14 R6_2.4.0
[10] rlang_0.3.4 fansi_0.4.0 httr_1.4.0
[13] stringr_1.4.0 blob_1.1.1 tools_3.5.2
[16] parallel_3.5.2 Biobase_2.42.0 utf8_1.1.4
[19] cli_1.1.0 DBI_1.0.0 bit64_0.9-7
[22] digest_0.6.18 assertthat_0.2.1 crayon_1.3.4
[25] S4Vectors_0.20.1 bitops_1.0-6 curl_3.3
[28] RCurl_1.95-4.12 memoise_1.1.0 RSQLite_2.1.1
[31] stringi_1.4.3 pillar_1.3.1 compiler_3.5.2
[34] prettyunits_1.0.2 stats4_3.5.2 XML_3.98-1.19
[37] pkgconfig_2.0.2