Question

GenomicFeatures supportedUCSCFeatureDbTracks never returns and has to be terminated with C-c

0

Entering edit mode

Malcolm Cook ★ 1.6k

@malcolm-cook-6293

Last seen 17 days ago

United States

In my hands, supportedUCSCFeatureDbTracks never returns and has to be terminated with C-c.

Any debugging help very welcome.

I've tried with hg19 in addition to dm6. Below I report my linux setup, but it also fails under windows, R 3.5.1, GenomicFeatures_1.34.1

> trks<-supportedUCSCFeatureDbTracks('dm6') C-c C-c > sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /n/apps/CentOS7/install/r-3.5.0/lib64/R/lib/libRblas.so LAPACK: /n/apps/CentOS7/install/r-3.5.0/lib64/R/lib/libRlapack.so

locale: [1] C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] GenomicFeatures_1.32.0 AnnotationDbi_1.42.1 Biobase_2.40.0 GenomicRanges_1.32.6 GenomeInfoDb_1.16.0 IRanges_2.14.12 S4Vectors_0.18.3 BiocGenerics_0.26.0 devtools_1.13.6

loaded via a namespace (and not attached): [1] Rcpp_0.12.19 compiler_3.5.0 XVector_0.20.0 prettyunits_1.0.2 bitops_1.0-6 tools_3.5.0 zlibbioc_1.26.0 progress_1.2.0 biomaRt_2.36.1 digest_0.6.18 bit_1.1-14 lattice_0.20-35 RSQLite_2.1.1 memoise_1.1.0 pkgconfig_2.0.2 rlang_0.3.0.1 Matrix_1.2-15 DelayedArray_0.6.6 DBI_1.0.0 GenomeInfoDbData_1.1.0 rtracklayer_1.40.3 [22] withr_2.1.2 stringr_1.3.1 httr_1.3.1 Biostrings_2.48.0 hms_0.4.2 grid_3.5.0 bit64_0.9-7 R6_2.3.0 BiocParallel_1.14.2 XML_3.98-1.12 blob_1.1.1 magrittr_1.5 matrixStats_0.54.0 GenomicAlignments_1.16.0 Rsamtools_1.32.2 SummarizedExperiment_1.10.1 assertthat_0.2.0 stringi_1.2.4 RCurl_1.95-4.11 crayon_1.3.4

genomicfeatures bug • 1.2k views

ADD COMMENT • link updated 6.2 years ago by James W. MacDonald 68k • written 6.2 years ago by Malcolm Cook ★ 1.6k

score 0 · Answer 1 · 2018-11-26

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

I think you are being too impatient. There are 35 tracks for dm6, and for each track you have to make a couple of queries to UCSC, which does take time. Even if I jam it out with a core for each query it takes a bit of time, mostly in transit (because the elapsed time is almost a minute, but the system time is not even two seconds):

> session <- browserSession()
> genome(session) <- "dm6"
> trx <- trackNames(ucscTableQuery(session))
> library(BiocParallel)
> system.time(bplapply(trx, GenomicFeatures:::isGoodTrack, session, BPPARAM = MulticoreParam(35)))
   user  system elapsed
  0.320   1.376  49.718

But waiting for that does look boring. Maybe it should be parallelized?

ADD COMMENT • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks, James, but I don't understand your response. I commented on the function supportedUCSCFeatureDbTracks, which by my sights you did not exercise in your code, so, how could it address my issue?

ADD REPLY • link 6.2 years ago Malcolm Cook ★ 1.6k

0

Entering edit mode

I ran the underlying code, which is normally run serially. I parallelized the slow step to show you that it does run. But since it's normally run serially, it will probably take about 35 minutes to run.

ADD REPLY • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

The code underlying supportedUCSCFeatureDbTracks is

> supportedUCSCFeatureDbTracks function (genome) { session <- browserSession() genome(session) <- genome query <- ucscTableQuery(session) trx <- trackNames(ucscTableQuery(session)) supported <- makeWhiteList(session, trx) trx[supported] }

Which looks in part like what you are telling me is the underlying code, but not completely.

When I run the above line by line (adding namespace prefix as needed), I find this line is where things stall

supported <- GenomicFeatures:::makeWhiteList(session, trx)

Which line is absent from what you are running.

... Ah, but, now I see, makeWhiteList in turn calls isGoodTrack.

Neither makeWhiteList nor isGoodTrack are exposed/documented so I don't know what to expect them to be doing.

It appears now to me that the line

trx <- trackNames(ucscTableQuery(session))

tells me what I want to know. Can you advise what further filtering for 'goodTracks' is supposed to provide me?

Thanks!

ADD REPLY • link 6.2 years ago Malcolm Cook ★ 1.6k

0

Entering edit mode

I think we may be getting off point here. The main point I was trying to make, originally, was that the underlying code does work, and the fact that you were killing the process early had more to do with impatience on your part than some sort of error in the code.

To show that the underlying code does work, I had to expose some of the code and I parallelized it so I didn't have to wait for some amount of time =< 35 minutes in order to show that it does work, for some definition of 'work'.

I'm not sure it's critical for you to understand what isGoodTrack does, but I would imagine that it's not really that hard to infer? You already know that we are getting back a bunch of tracks from UCSC, and then a function called isGoodTrack generates something that can be used to subset that set of tracks to just those that can be used to create a FeatureDb. And the function name is isGoodTrack?

So, the main point here is that

The code does work, and you are killing the process early because you think it's frozen when in fact it is diligently working.
The diligent work that is being done is repeated 35 times, and each time takes about a minute, +/-, which ends up being pretty inefficient, and it may be useful to parallelize if we expect people to be able to use supportedUCSDFeatureDbTracks without getting bored and killing the process. The downside of doing that would be the requirement that GenomicFeatures would have to depend on BiocParallel, for this one function that until now I didn't know existed. So the tradeoff would be to make the GenomicFeatures package more complex to speed up a function that maybe people don't even use?

ADD REPLY • link 6.2 years ago James W. MacDonald 68k