Issue with UCSC and rtracklayer: 'names' attribute [210] must be the same length as the vector [209]
4
0
Entering edit mode
@chrisamiller-15760
Last seen 6.6 years ago

I successfully ran this command several days ago, but am now having problems that sound like the same thing reported in this issue: possible Rtracklayer issue: TxDb errors in GenomicFeatures package-- cannot load TxDb with standard params from UCSC (issue began last night)


> library(RiboProfiling)
> library(Rsamtools)
> listInputBam <- c(BamFile(args[1]))
> c covData <- riboSeqFromBAM(listInputBam, genomeName="hg19")

Get UCSC ensGene annotations.

Error in names(trackIds) <- sub("^ ", "", nms[nms != "new"]) :
  'names' attribute [210] must be the same length as the vector [209]
In addition: Warning message:
In riboSeqFromBAM(listInputBam, genomeName = "hg19") :
  paramScanBAM parameter is not a ScanBamParam object. Set to default NULL value!

> traceback()
12: .local(object, ...)
11: ucscTracks(object)
10: ucscTracks(object)
9: .local(object, ...)
8: trackNames(session)
7: trackNames(session)
6: supportedUCSCtables(session)
5: .tablename2track(tablename, session)
4: GenomicFeatures::makeTxDbFromUCSC(genome = genomeName, tablename = "ensGene",
       url = "http://genome-euro.ucsc.edu/cgi-bin/")
3: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
2: suppressWarnings(GenomicFeatures::makeTxDbFromUCSC(genome = genomeName,
       tablename = "ensGene", url = "http://genome-euro.ucsc.edu/cgi-bin/"))
1: riboSeqFromBAM(listInputBam, genomeName = "hg19")

 

The code described in that previous post also results in the error:

> library(rtracklayer)
> session <- browserSession()
> genome(session) <- "hg38"

> track_names <- trackNames(session)
Error in names(trackIds) <- sub("^ ", "", nms[nms != "new"]) :
  'names' attribute [116] must be the same length as the vector [115]
> genome(session) <- "hg19"
> track_names <- trackNames(session)
Error in names(trackIds) <- sub("^ ", "", nms[nms != "new"]) :
  'names' attribute [210] must be the same length as the vector [209]
> sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] rtracklayer_1.38.3   Rsamtools_1.30.0     GenomicRanges_1.30.3
 [4] GenomeInfoDb_1.14.0  RiboProfiling_1.7.2  Biostrings_2.46.0
 [7] XVector_0.18.0       IRanges_2.12.0       S4Vectors_0.16.0
[10] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.10.0           bitops_1.0-6
 [3] matrixStats_0.53.1            bit64_0.9-7
 [5] RColorBrewer_1.1-2            progress_1.1.2
 [7] httr_1.3.1                    tools_3.4.0
 [9] backports_1.1.2               R6_2.2.2
[11] rpart_4.1-13                  Hmisc_4.1-1
[13] DBI_1.0.0                     lazyeval_0.2.1

[15] colorspace_1.3-2              nnet_7.3-12
[17] gridExtra_2.3                 prettyunits_1.0.2
[19] GGally_1.3.2                  RMySQL_0.10.14
[21] curl_3.2                      bit_1.1-12
[23] compiler_3.4.0                chron_2.3-52
[25] graph_1.56.0                  Biobase_2.38.0
[27] htmlTable_1.11.2              DelayedArray_0.4.1
[29] ggbio_1.26.1                  scales_0.5.0
[31] checkmate_1.8.5               RBGL_1.54.0
[33] stringr_1.2.0                 digest_0.6.15
[35] foreign_0.8-69                base64enc_0.1-3
[37] dichromat_2.0-0               htmltools_0.3.6
[39] ensembldb_2.2.2               BSgenome_1.46.0
[41] htmlwidgets_1.0               rlang_0.2.0
[43] rstudioapi_0.7                RSQLite_2.1.1
[45] BiocInstaller_1.28.0          shiny_1.0.5
[47] BiocParallel_1.12.0           acepack_1.4.1
[49] VariantAnnotation_1.24.5      RCurl_1.95-4.10
[51] magrittr_1.5                  GenomeInfoDbData_1.0.0
[53] Formula_1.2-2                 Matrix_1.2-13
[55] Rcpp_0.12.16                  munsell_0.4.3
[57] proto_1.0.0                   sqldf_0.4-11
[59] stringi_1.1.5                 yaml_2.1.18
[61] SummarizedExperiment_1.8.1    zlibbioc_1.24.0
[63] plyr_1.8.4                    AnnotationHub_2.10.1
[65] grid_3.4.0                    blob_1.1.1
[67] promises_1.0.1                lattice_0.20-35
[69] splines_3.4.0                 GenomicFeatures_1.30.3
[71] knitr_1.20                    pillar_1.2.1
[73] reshape2_1.4.3                biomaRt_2.34.2
[75] XML_3.98-1.10                 biovizBase_1.26.0
[77] latticeExtra_0.6-28           data.table_1.10.4-3
[79] httpuv_1.4.2                  gtable_0.2.0
[81] reshape_0.8.7                 assertthat_0.2.0
[83] gsubfn_0.7                    ggplot2_2.2.1
[85] mime_0.5                      xtable_1.8-2
[87] AnnotationFilter_1.2.0        later_0.7.2
[89] survival_2.41-3               OrganismDbi_1.20.0
[91] tibble_1.4.2                  GenomicAlignments_1.14.2
[93] AnnotationDbi_1.40.0          memoise_1.1.0
[95] cluster_2.0.7                 interactiveDisplayBase_1.16.0

 

rtracklayer ucsc ensgene • 2.2k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

This has been fixed in devel and release (1.40.1) thanks to Jim McDonald. I put in a more robust (hopefully) heuristic in 1.40.2.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

This error has to do with an internal function called trackNames that gets all the track names from UCSC when getting all the supportedUCSCtables (see your traceback output). This was fixed last Friday, and if you update to a current version of R and Bioconductor (R-3.5.0 and Bioc 3.7), you won't have any problems.

> library(GenomicFeatures)

> supportedUCSCtables()
             tablename          track           subtrack
1            knownGene     UCSC Genes               <NA>
2        knownGeneOld8 Old UCSC Genes               <NA>
3        knownGeneOld7 Old UCSC Genes               <NA>
4        knownGeneOld6 Old UCSC Genes               <NA>
5        knownGeneOld4 Old UCSC Genes               <NA>
6        knownGeneOld3 Old UCSC Genes               <NA>
7             ccdsGene           CCDS               <NA>
8              refGene    NCBI RefSeq        UCSC RefSeq
9          xenoRefGene   Other RefSeq               <NA>
10            vegaGene     Vega Genes Vega Protein Genes
11      vegaPseudoGene     Vega Genes   Vega Pseudogenes
12             ensGene  Ensembl Genes               <NA>
13             acembly  AceView Genes               <NA>
14             sibGene      SIB Genes               <NA>
15       nscanPasaGene         N-SCAN    N-SCAN PASA-EST
16           nscanGene         N-SCAN             N-SCAN
17             sgpGene      SGP Genes               <NA>
18              geneid   Geneid Genes               <NA>
19             genscan  Genscan Genes               <NA>
20            exoniphy       Exoniphy               <NA>
21          ncbiRefSeq    NCBI RefSeq         RefSeq All
22   ncbiRefSeqCurated    NCBI RefSeq     RefSeq Curated
23 ncbiRefSeqPredicted    NCBI RefSeq   RefSeq Predicted
24     ncbiRefSeqOther    NCBI RefSeq       RefSeq Other
25       ncbiRefSeqPsl    NCBI RefSeq  RefSeq Alignments
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.32.0 AnnotationDbi_1.42.0   Biobase_2.40.0        
[4] GenomicRanges_1.32.0   GenomeInfoDb_1.16.0    IRanges_2.14.1        
[7] S4Vectors_0.18.1       BiocGenerics_0.26.0   

loaded via a namespace (and not attached):
 [1] ComplexHeatmap_1.18.0       Rcpp_0.12.16               
 [3] compiler_3.5.0              RColorBrewer_1.1-2         
 [5] XVector_0.20.0              prettyunits_1.0.2          
 [7] bitops_1.0-6                tools_3.5.0                
 [9] zlibbioc_1.26.0             progress_1.1.2             
[11] biomaRt_2.36.0              digest_0.6.15              
[13] bit_1.1-12                  lattice_0.20-35            
[15] RSQLite_2.1.0               memoise_1.1.0              
[17] Matrix_1.2-14               DelayedArray_0.6.0         
[19] DBI_1.0.0                   GenomeInfoDbData_1.1.0     
[21] rtracklayer_1.40.1          httr_1.3.1                 
[23] stringr_1.3.0               Biostrings_2.48.0          
[25] GlobalOptions_0.0.13        bit64_0.9-7                
[27] grid_3.5.0                  R6_2.2.2                   
[29] GetoptLong_0.1.6            BiocParallel_1.14.0        
[31] XML_3.98-1.11               magrittr_1.5               
[33] blob_1.1.1                  matrixStats_0.53.1         
[35] GenomicAlignments_1.16.0    Rsamtools_1.32.0           
[37] SummarizedExperiment_1.10.0 assertthat_0.2.0           
[39] shape_1.4.4                 circlize_0.4.3             
[41] colorspace_1.3-2            stringi_1.2.2              
[43] RCurl_1.95-4.10             rjson_0.2.15               
>
ADD COMMENT
1
Entering edit mode
max ▴ 10
@max-12416
Last seen 6.6 years ago

You're parsing our HTML???? This is ... crazy. This means that your parser breaks each time we make the smallest change in our html. We have a public SQL server for this purpose and are happy to help with setting up the queries in the right way.

ADD COMMENT
0
Entering edit mode
max ▴ 10
@max-12416
Last seen 6.6 years ago

As far as I know, we haven't changed the refGene table, I think we have never changed since they were released, to avoid problems like this. Could it be something that has to do with a Mysql server update? Do you know the exact Mysql query that is run?

ADD COMMENT
0
Entering edit mode

Oh and I'm from UCSC, not Bioconductor, so if this is a Bioconductor problem, I cannot comment on that, I know nothing about Bioconductor

ADD REPLY
0
Entering edit mode

The basic problem is that rtracklayer was originally designed as an interface to the genome browser, not the database. It interacts with the web site directly, relying on a myriad of heuristics. People have come to use it almost exclusively as an interface to the database, so really it should just be making SQL queries.

ADD REPLY

Login before adding your answer.

Traffic: 394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6