makeTxDbFromUCSC("hg38", "refGene") gives "not supported" error
1
0
Entering edit mode
@mattchambers42-10186
Last seen 7.6 years ago
Of course this track really exists or I wouldn't post an issue about it. :)
> browseUCSCtrack("hg38", "refGene")

Easy to reproduce the error:
​> makeTxDbFromUCSC(genome="hg38", tablename="refGene")
Error in .tablename2track(tablename, session) :
  UCSC table "refGene" is not supported
​
> devtools::session_info()
Session info ------------------------------------------
setting  value                  
version  R version 3.3.3 (2017-03-06)
system   x86_64, mingw32        
ui       RStudio (1.0.136)      
language (EN)                   
collate  English_United States.1252
tz       America/Chicago        
date     2017-03-28                  

Packages ------------------------------------------------
package              * version  date       source      
AnnotationDbi        * 1.36.2   2017-01-30 Bioconductor
Biobase              * 2.34.0   2016-10-18 Bioconductor
BiocGenerics         * 0.20.0   2016-10-18 Bioconductor
BiocInstaller        * 1.24.0   2016-10-18 Bioconductor
BiocParallel           1.8.1    2016-10-30 Bioconductor
biomaRt                2.30.0   2016-10-18 Bioconductor
Biostrings             2.42.1   2016-12-01 Bioconductor
bitops                 1.0-6    2013-08-17 CRAN (R 3.3.2)
BSgenome               1.42.0   2016-10-18 Bioconductor
commonmark             1.2      2017-03-01 CRAN (R 3.3.3)
crayon                 1.3.2    2016-06-28 CRAN (R 3.3.3)
customProDB          * 1.15.1   <NA>       Bioconductor
data.table             1.10.4   2017-02-01 CRAN (R 3.3.2)
DBI                  * 0.5-1    2016-09-10 url         
devtools               1.12.0   2016-06-24 CRAN (R 3.3.3)
digest                 0.6.12   2017-01-27 CRAN (R 3.3.3)
GenomeInfoDb         * 1.10.3   2017-02-07 Bioconductor
GenomicAlignments      1.10.1   2017-03-18 Bioconductor
GenomicFeatures      * 1.26.3   2017-02-22 Bioconductor
GenomicRanges        * 1.26.4   2017-03-18 Bioconductor
GetoptLong             0.1.6    2017-03-07 CRAN (R 3.3.3)
GlobalOptions          0.0.11   2017-03-06 CRAN (R 3.3.3)
IRanges              * 2.8.2    2017-03-18 Bioconductor
lattice                0.20-35  2017-03-25 CRAN (R 3.3.3)
magrittr               1.5      2014-11-22 CRAN (R 3.3.3)
Matrix                 1.2-8    2017-01-20 CRAN (R 3.3.3)
memoise                1.0.0    2016-01-29 CRAN (R 3.3.2)
plyr                   1.8.4    2016-06-08 CRAN (R 3.3.3)
R6                     2.2.0    2016-10-05 CRAN (R 3.3.2)
Rcpp                   0.12.10  2017-03-19 CRAN (R 3.3.3)
RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.2)
rjson                  0.2.15   2014-11-03 CRAN (R 3.3.2)
RMySQL               * 0.10.8   2016-01-29 url         
roxygen2               6.0.1    2017-02-06 CRAN (R 3.3.3)
Rsamtools              1.26.1   2016-10-22 Bioconductor
RSQLite                1.1-2    2017-01-08 CRAN (R 3.3.3)
rstudioapi             0.6      2016-06-27 CRAN (R 3.3.2)
rtracklayer            1.34.2   2017-02-19 Bioconductor
S4Vectors            * 0.12.2   2017-03-18 Bioconductor
stringi                1.1.3    2017-03-21 CRAN (R 3.3.3)
stringr                1.2.0    2017-02-18 CRAN (R 3.3.3)
SummarizedExperiment   1.4.0    2016-10-18 Bioconductor
testthat             * 1.0.2    2016-04-23 CRAN (R 3.3.3)
VariantAnnotation      1.20.3   2017-03-18 Bioconductor
withr                  1.0.2    2016-06-20 CRAN (R 3.3.2)
XML                    3.98-1.5 2016-11-10 CRAN (R 3.3.2)
xml2                   1.1.1    2017-01-24 CRAN (R 3.3.3)
XVector                0.14.1   2017-03-18 Bioconductor
zlibbioc               1.20.0   2016-10-18 Bioconductor
GenomicFeatures • 2.6k views
ADD COMMENT
0
Entering edit mode

Seems like supportedUCSCtables() should look at tableNames(ucscTableQuery(session)) instead of names(trackNames(session)) when filtering. 

ADD REPLY
1
Entering edit mode
@herve-pages-1542
Last seen 4 minutes ago
Seattle, WA, United States

Hi Matt,

The RefSeq Genes track actually doesn't exist anymore for hg38. It has been replaced with a new composite track named NCBI RefSeq and made of 6 subtracks. See announcement here (from March 3, 2017):

  https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome-announce/Prrn-OAFx6U

I just fixed GenomicFeatures in devel (GenomicFeatures 1.27.11) to support this new track. Will port the fix to BioC release later today. It will take about 24 hours for the fix to propagate to the public repositories and become available via biocLite()

Also I still need to fix browseUCSCtrack(). It's still taking you to what looks like a stale page for the old RefSeq Genes track for hg38.

Cheers,

H.

Edit: This is now ported to GenomicFeatures in release (GenomicFeatures 1.26.4). For reasons I don't really understand, browseUCSCtrack() seems to be working as expected again (I didn't touch it).

ADD COMMENT
0
Entering edit mode

Although the refGene table still exists in their database and is accessible from the table browser, so it would be nice to provide access to it.

ADD REPLY
0
Entering edit mode

You still have access to it. As reported by supportedUCSCtables("hg38") (from GenomicFeatures 1.27.11), this table is now associated to the new UCSC RefSeq subtrack of the composite NCBI RefSeq track.

H.

ADD REPLY
0
Entering edit mode

Is there any way rtracklayer could help with that function? Hard-coding the mappings does not seem sustainable. I'm not sure why the names of the tracks are even needed there. The table browser (and UCSCTableQuery) supports direct table access, without need for a track name. Calling trackNames(session) is particularly problematic, because long names get truncated for the UI and won't match the track names in the table browser.

ADD REPLY
0
Entering edit mode

Yes hard-coding the mapping between tables and tracks in supportedUCSCtables() is ugly and I welcome any suggestion to improve this. This function has 2 purposes:

  1. Provide the list of tables/tracks that are known to be compatible with makeTxDbFromUCSC()
  2. Map tables to tracks (many-to-one mapping), and to subtracks (if any).

makeTxDbFromUCSC() requires a table name but table names can be somewhat obscure. Most of the time the user knows the name of the track/subtrack that s/he is interested in so supportedUCSCtables() provides a quick and easy way for him/her to find the name of the central table for a given track/subtrack.

Probably the reason for specifying the track when calling ucscTableQuery() is that neither the signature of the function nor its documentation suggest that the track argument can be omitted. It actually seems to work for some tables but not all of them:

library(rtracklayer)
session <- browserSession()
genome(session) <- "hg19"

ucscTableQuery(session, track="RefSeq Genes", table="hgFixed.refLink")  # OK
# Get table 'hgFixed.refLink' from track 'RefSeq Genes' within hg19:*:*-*

ucscTableQuery(session, table="hgFixed.refLink")  # error!
# Error in normArgTable(value, x) : unknown table name 'hgFixed.refLink'

H.

ADD REPLY
0
Entering edit mode

I should add that one benefit of having supportedUCSCtables() use a hard coded list of tables/tracks that are known to be compatible with makeTxDbFromUCSC() is that it makes the function very snappy. Having a smart supportedUCSCtables() that builds that list on-the-fly by querying the Genome Browser would probably make the function much slower. This function is typically called interactively by the user before calling makeTxDbFromUCSC() (and is called again inside makeTxDbFromUCSC()) so it should not take too long (e.g. < 20 sec.).

H.

ADD REPLY
0
Entering edit mode

Yea, I guess there's no feasible way to automate it, even if cached inside the package. If a table moves to another track we'd never find it without an exhaustive search.
 

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6