biomaRt asked to report: The number of columns in the result table does not equal the number of attributes in the query.
1
0
Entering edit mode
@daniilsarkisyan-7626
Last seen 4.5 years ago
Sweden

As requested by biomaRt, I am reporting the issue.

library(biomaRt)
ensembl_mart <- useEnsembl(biomart = "ensembl", 
                   dataset = "hsapiens_gene_ensembl")
attributes <- biomaRt::searchAttributes(mart = ensembl_mart, "coding|cds")
attributes <- attributes[attributes$page=="sequences","name"]
attributes
## [1] "coding_transcript_flank" "coding_gene_flank"       "coding"                  "cdna_coding_start"       "cdna_coding_end"        
## [6] "cds_length"              "cds_start"               "cds_end"                 "cdna_coding_start"       "cdna_coding_end"        
##[11] "genomic_coding_start"    "genomic_coding_end"  

result <- biomaRt::getBM(attributes = attributes, filters = c("ensembl_transcript_id"), values = "ENST00000217305", mart = ensembl_mart)
## NULL
## Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery,  : 
##  The query to the BioMart webservice returned an invalid result.
## The number of columns in the result table does not equal the number of attributes in the query.
## Please report this on the support site at http://support.bioconductor.org

sessionInfo()
# R version 4.0.0 (2020-04-24)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] biomaRt_2.44.0
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.4.6         compiler_4.0.0       pillar_1.4.4         dbplyr_1.4.4         prettyunits_1.1.1    tools_4.0.0          progress_1.2.2       digest_0.6.25        bit_1.1-15.2         RSQLite_2.2.0        memoise_1.1.0       
# [12] BiocFileCache_1.12.0 tibble_3.0.1         lifecycle_0.2.0      pkgconfig_2.0.3      rlang_0.4.6          DBI_1.1.0            curl_4.3             parallel_4.0.0       stringr_1.4.0        httr_1.4.1           dplyr_1.0.0         
# [23] rappdirs_0.3.1       generics_0.0.2       S4Vectors_0.26.1     vctrs_0.3.1          askpass_1.1          IRanges_2.22.2       hms_0.5.3            tidyselect_1.1.0     stats4_4.0.0         bit64_0.9-7          glue_1.4.1          
# [34] Biobase_2.48.0       R6_2.4.1             AnnotationDbi_1.50.0 XML_3.99-0.3         purrr_0.3.4          blob_1.2.1           magrittr_1.5         ellipsis_0.3.1       BiocGenerics_0.34.0  assertthat_0.2.1     stringi_1.4.6       
# [45] openssl_1.4.1        crayon_1.3.4
biomart • 960 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Usually when I get something like this, I try removing things to see if I can find the culprit. Or maybe just try one at a time. And if I do that, I see

> biomaRt::getBM(attributes = attributes[1], filters = c("ensembl_transcript_id"), values = "ENST00000217305", mart = ensembl_mart)
                                                                                                                          coding_transcript_flank
1 Query ERROR: caught BioMart::Exception::Usage: Requests for flank sequence must be accompanied by an upstream_flank or downstream_flank request

Which you can try to figure out by going to the ensembl.org website and playing around with the sequences attributes page. If you put in a query there is a URL button at the top that you can click to get the query URL. That helped me a bit, but more informative was ?getSequence, and looking at the code. Which led me to this:

> biomaRt::getBM(attributes = c("coding_transcript_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                   coding_transcript_flank ensembl_transcript_id
1 CTTCTCTTTCTTCCTCCCCAGCAGGAATTGCTGAGACAGG       ENST00000217305

Where you should note two things - you should ALWAYS use the primary filter ID as an attribute as well. Because you get results back in random order, so you need a way to map results to the incoming ID you used. And you specify the upstream_flank (or downstream_flank, but not both) as part of the filters and the flank length as part of the values. Playing around, this is what I can get

> getBM(attributes[-(1:2)], c("ensembl_transcript_id"), list("ENSt00000217305"), ensembl_mart)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         coding
1 ATGGCCTGGCAGGGGCTGGTCCTGGCTGCCTGCCTCCTCATGTTCCCCTCCACCACAGCGGACTGCCTGTCGCGGTGCTCCTTGTGTGCTGTAAAGACCCAGGATGGTCCCAAACCTATCAATCCCCTGATTTGCTCCCTGCAATGCCAGGCTGCCCTGCTGCCCTCTGAGGAATGGGAGAGATGCCAGAGCTTTCTGTCTTTTTTCACCCCCTCCACCCTTGGGCTCAATGACAAGGAGGACTTGGGGAGCAAGTCGGTTGGGGAAGGGCCCTACAGTGAGCTGGCCAAGCTCTCTGGGTCATTCCTGAAGGAGCTGGAGAAAAGCAAGTTTCTCCCAAGTATCTCAACAAAGGAGAACACTCTGAGCAAGAGCCTGGAGGAGAAGCTCAGGGGTCTCTCTGACGGGTTTAGGGAGGGAGCAGAGTCTGAGCTGATGAGGGATGCCCAGCTGAACGATGGTGCCATGGAGACTGGCACACTCTATCTCGCTGAGGAGGACCCCAAGGAGCAGGTCAAACGCTATGGGGGCTTTTTGCGCAAATACCCCAAGAGGAGCTCAGAGGTGGCTGGGGAGGGGGACGGGGATAGCATGGGCCATGAGGACCTGTACAAACGCTATGGGGGCTTCTTGCGGCGCATTCGTCCCAAGCTCAAGTGGGACAACCAGAAGCGCTATGGCGGTTTTCTCCGGCGCCAGTTCAAGGTGGTGACTCGGTCTCAGGAAGATCCGAATGCTTACTCTGGAGAGCTTTTTGATGCATAA
  cdna_coding_start cdna_coding_end cds_length cds_start cds_end
1           357;228         992;356        765     130;1 765;129
  cdna_coding_start.1 cdna_coding_end.1 genomic_coding_start genomic_coding_end
1             357;228           992;356      1980323;1982956    1980958;1983084

> biomaRt::getBM(attributes = c("gene_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                                gene_flank ensembl_transcript_id
1 GCTCTCGTCCATAAAAGGGGGGAAGAGGCACCAGAACTGC       ENST00000217305

> biomaRt::getBM(attributes = c("coding_transcript_flank","ensembl_transcript_id"), filters = c("ensembl_transcript_id", "upstream_flank"), values = list("ENST00000217305", 40), mart = ensembl_mart, checkFilters = FALSE)
                   coding_transcript_flank ensembl_transcript_id
1 CTTCTCTTTCTTCCTCCCCAGCAGGAATTGCTGAGACAGG       ENST00000217305

But it requires separate calls, and coding_gene_flank doesn't work.

ADD COMMENT
0
Entering edit mode

I should also note that querying online data can be sort of flaky, and Ensembl has been particularly flaky of late (evidently they are upgrading their systems?), so it is often easier to use downloaded sources. As a quick example (there are several vignettes you could read, in particular for the ensembldb package, AnnotationHub, BSgenome, AnnotationDbi).

> library(AnnotationHub)

> hub <- AnnotationHub()
snapshotDate(): 2020-04-27
> query(hub, c("homo sapiens","ensdb"))
AnnotationHub with 15 records
# snapshotDate(): 2020-04-27
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

            title                             
  AH53211 | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715 | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681 | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757 | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773 | Ensembl 91 EnsDb for Homo Sapiens 
  ...       ...                               
  AH73881 | Ensembl 97 EnsDb for Homo sapiens 
  AH73986 | Ensembl 79 EnsDb for Homo sapiens 
  AH75011 | Ensembl 98 EnsDb for Homo sapiens 
  AH78783 | Ensembl 99 EnsDb for Homo sapiens 
  AH79689 | Ensembl 100 EnsDb for Homo sapiens

### The biomaRt uses Ensembl 100, so we do too
> z <- hub[["AH79689"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
require("ensembldb")
> columns(z)
 [1] "DESCRIPTION"         "ENTREZID"            "EXONID"             
 [4] "EXONIDX"             "EXONSEQEND"          "EXONSEQSTART"       
 [7] "GCCONTENT"           "GENEBIOTYPE"         "GENEID"             
[10] "GENEIDVERSION"       "GENENAME"            "GENESEQEND"         
[13] "GENESEQSTART"        "INTERPROACCESSION"   "ISCIRCULAR"         
[16] "PROTDOMEND"          "PROTDOMSTART"        "PROTEINDOMAINID"    
[19] "PROTEINDOMAINSOURCE" "PROTEINID"           "PROTEINSEQUENCE"    
[22] "SEQCOORDSYSTEM"      "SEQLENGTH"           "SEQNAME"            
[25] "SEQSTRAND"           "SYMBOL"              "TXBIOTYPE"          
[28] "TXCDSSEQEND"         "TXCDSSEQSTART"       "TXID"               
[31] "TXIDVERSION"         "TXNAME"              "TXSEQEND"           
[34] "TXSEQSTART"          "TXSUPPORTLEVEL"      "UNIPROTDB"          
[37] "UNIPROTID"           "UNIPROTMAPPINGTYPE" 
> select(z, "ENST00000217305", c("GENESEQSTART","GENESEQEND","TXCDSSEQSTART","TXCDSSEQEND"), "TXNAME")
           TXNAME GENESEQSTART GENESEQEND TXCDSSEQSTART TXCDSSEQEND
1 ENST00000217305      1978757    1994285       1980323     1983084
             TXID
1 ENST00000217305
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
## these are all genomic coordinates - biomaRt is giving relative coordinates for some things
> cdsBy(z)[["ENST00000217305"]]
GRanges object with 2 ranges and 2 metadata columns:
      seqnames          ranges strand |         exon_id exon_rank
         <Rle>       <IRanges>  <Rle> |     <character> <integer>
  [1]       20 1982956-1983084      - | ENSE00000655739         3
  [2]       20 1980323-1980958      - | ENSE00001108022         4
  -------
  seqinfo: 369 sequences from GRCh38 genome
> library("BSgenome.Hsapiens.UCSC.hg19")
## this is ABSOLUTELY the wrong BSgenome package, but I'm using it as an example...
> getSeq(Hsapiens, GRanges("chr20:1982956-1983084"))
DNAStringSet object of length 1:
    width seq
[1]   129 TGAAAGAAATCATCCAGGCTTTTTAAAGCAAAAG...GTCAACAGGGCCTCTGTAAATGACTCCCAGAAT

ADD REPLY

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6