Apparently, when using biomaRt::getBM(), with listAttributes(gene.db, page = "feature_page"), there is a limit (3 ?) on the number of features which can be "external". Is there a way to know which features are "external" or not ? listAttributes has another argument 'what' with default: what = c("name","description","page"), it also accepts "fullDescription" and the Rdocumentation says "Can have values like name, description, fullDescription, page" - perhaps there is another argument for "external" or perhaps there is a way around the error ? The error received if too many "external" features are asked for is :Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery, : Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References

gene.db <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="", path="/biomart/martservice", archive=FALSE)
listAttributes(gene.db, page = "feature_page", what = c("name","description","page"))
getBM(attributes = , filters = c("chromosome_name", "start", "end"), values = , mart = gene.db)

sessionInfo( ) R version 4.4.1 (2024-06-14 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale: [1] LC_COLLATE=Polish_Poland.utf8 LC_CTYPE=Polish_Poland.utf8
[3] LC_MONETARY=Polish_Poland.utf8 LC_NUMERIC=C
[5] LC_TIME=Polish_Poland.utf8

time zone: Europe/Warsaw tzcode source: internal


Last seen 5 hours ago
United States

Mike Smith will probably be along in a while with actual information, but in the interim do note that you can cross-reference using the online Biomart page. The external references start with Biogrid genes and end with WikiGenes, and are in the same order in listAttributes.

> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> z <- listAttributes(mart)
> grep("biogrid|wikigene", z[,1], = TRUE)
[1]  50 100 101 102
> z[50:102,1]
 [1] "biogrid"                 
 [2] "ccds"                    
 [3] "chembl"                  
 [4] "dbass3_name"             
 [5] "dbass3_id"               
 [6] "dbass5_name"             
 [7] "dbass5_id"               
 [8] "entrezgene_trans_name"   
 [9] "embl"                    
[10] "arrayexpress"            
[11] "genecards"               
[12] "hgnc_id"                 
[13] "hgnc_symbol"             
[14] "hpa_accession"           
[15] "hpa_id"                  
[16] "protein_id"              
[17] "ens_lrg_gene"            
[18] "ens_lrg_transcript"      
[19] "merops"                  
[20] "mim_gene_description"    
[21] "mim_gene_accession"      
[22] "mim_morbid_description"  
[23] "mim_morbid_accession"    
[24] "mirbase_accession"       
[25] "mirbase_id"              
[26] "mirbase_trans_name"      
[27] "entrezgene_description"  
[28] "entrezgene_accession"    
[29] "entrezgene_id"           
[30] "pdb"                     
[31] "reactome"                
[32] "reactome_gene"           
[33] "reactome_transcript"     
[34] "refseq_mrna"             
[35] "refseq_mrna_predicted"   
[36] "refseq_ncrna"            
[37] "refseq_ncrna_predicted"  
[38] "refseq_peptide"          
[39] "refseq_peptide_predicted"
[40] "rfam"                    
[41] "rfam_trans_name"         
[42] "rnacentral"              
[43] "hgnc_trans_name"         
[44] "ucsc"                    
[45] "uniparc"                 
[46] "uniprot_gn_symbol"       
[47] "uniprot_gn_id"           
[48] "uniprot_isoform"         
[49] "uniprotswissprot"        
[50] "uniprotsptrembl"         
[51] "wikigene_description"    
[52] "wikigene_name"           
[53] "wikigene_id"

So, not entirely done within R, but that's the list.

