Usecase_AnnotationHub_GRanges.Rmd, genes = qhs[[1]] can't download UCSC 'refGene'. - Window 7, R 3.2.1, RStudio 0.99.484
0
0
Entering edit mode
J.C. SUN • 0
@jc-sun
Last seen 3.7 years ago
Korea, Republic Of

Hi....

During my coursera course, Bioconductor for Genomic Data Science , I've found out the below issue which seems to be a bug.

I can't download UCSC 'refGene'. 

1. Below is the error message.

> ah <- AnnotationHub()
> ah <- subset(ah, species == "Homo sapiens")
> qhs <- query(ah, "RefSeq")
> qhs
AnnotationHub with 8 records
# snapshotDate(): 2015-08-26 
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype 
# retrieve records with, e.g., 'object[["AH5040"]]' 

           title       
  AH5040 | RefSeq Genes
  AH5041 | Other RefSeq
  AH5155 | RefSeq Genes
  AH5156 | Other RefSeq
  AH5306 | RefSeq Genes
  AH5307 | Other RefSeq
  AH5431 | RefSeq Genes
  AH5432 | Other RefSeq
> genes <- qhs[qhs$genome == "hg19" & qhs$title == "RefSeq Genes"]
> genes
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26 
# names(): AH5040
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: RefSeq Genes
# $description: GRanges object from UCSC track 'RefSeq Genes'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/re
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: refGene, UCSC, track, Gene, Transcript, Annotation 
# retrieve record with 'object[["AH5040"]]' 
> genes <- qhs[[1]]
Error in value[[3L]](cond) : 
  failed to load hub resource ‘RefSeq Genes’ of class GRanges; reason: bad
    restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘5040’ has magic number '<meta'
  Use of save versions prior to 2 is deprecated 

> genes = qhs[[2]]

retrieving 1 resources

  |==========================================================================================| 100%

There were 50 or more warnings (use warnings() to see the first 50)

> genes

UCSC track 'xenoRefGene'

UCSCData object with 161800 ranges and 5 metadata columns: 

2. Below is the sessionInfo() for reference. 
 

> sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1

 

locale:

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          

[5] LC_TIME=English_United States.1252    

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

 

other attached packages:

[1] BiocInstaller_1.18.4 AnnotationHub_2.0.3  rtracklayer_1.28.10  GenomicRanges_1.20.6

[5] GenomeInfoDb_1.4.2   IRanges_2.2.7        S4Vectors_0.6.5      BiocGenerics_0.14.0 

 

loaded via a namespace (and not attached):

 [1] Rcpp_0.12.0                  AnnotationDbi_1.30.1         XVector_0.8.0               

 [4] magrittr_1.5                 zlibbioc_1.14.0              GenomicAlignments_1.4.1     

 [7] BiocParallel_1.2.21          xtable_1.7-4                 R6_2.1.1                    

[10] stringr_1.0.0                httr_1.0.0                   tools_3.2.1                 

[13] Biobase_2.28.0               DBI_0.3.1                    lambda.r_1.1.7              

[16] futile.logger_1.4.1          htmltools_0.2.6              digest_0.6.8                

[19] interactiveDisplayBase_1.6.0 shiny_0.12.2                 futile.options_1.0.0        

[22] bitops_1.0-6                 curl_0.9.3                   RCurl_1.95-4.7              

[25] mime_0.4                     RSQLite_1.0.0                stringi_0.5-5               

[28] Biostrings_2.36.4            Rsamtools_1.20.4             XML_3.98-1.3                

[31] httpuv_1.3.3         

Smiles

annotationhub bioconductor for genomic data science • 2.8k views
ADD COMMENT
0
Entering edit mode

Works for me with a virtually identical sessionInfo(). Maybe your download got corrupted somehow. Try closing your R session and removing your AnnotationHub cache directory (the directory pointed to by hubCache(ah) ). Then try it again.

 

ADD REPLY
0
Entering edit mode

A subtler approach is to remove the cached file

cache(qhs[1]) <- NULL

See ?"cache<-". The database itself can be removed with file.remove(dbfile(qhs)).

ADD REPLY
0
Entering edit mode

Thanks Martin.

There seems to be a lot of download problems with AnnotationHub.  All reports seems to be using Windows.

Best,
Kasper

 

ADD REPLY
0
Entering edit mode

Here is a good session from a user.  I'm not sure I understand whether it is the sqlite database which is corrupt or the local cache:

Please see the results below of removing the database (I get the same behavior as before) and the sessionInfo() below that

 

> file.exists('C:/Users/C034614/Documents/AppData/.AnnotationHub/annotationhub.sqlite3')
[1] TRUE
> unlink('C:/Users/C034614/Documents/AppData/.AnnotationHub/annotationhub.sqlite3')
> file.exists('C:/Users/C034614/Documents/AppData/.AnnotationHub/annotationhub.sqlite3')
[1] FALSE
> library("AnnotationHub")
Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’
> ah = AnnotationHub()
retrieving 1 resources
Error: 'AnnotationHub' database corrupt; remove it and try again
  database: ‘C:/Users/C034614/Documents/AppData/.AnnotationHub/annotationhub.sqlite3’
  reason: missing tables
In addition: Warning messages:
1: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
2: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
3: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
4: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
5: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
6: In curl::curl_fetch_disk(url, x$path, handle = handle) :
  progress callback must return boolean
7: download failed
  hub path: ‘https://annotationhub.bioconductor.org/metadata/annotationhub.sqlite3’
  cache path: ‘C:/Users/C034614/Documents/AppData/.AnnotationHub/annotationhub.sqlite3’
  reason: Couldn't connect to server 
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.0.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0                  IRanges_2.2.7                digest_0.6.8                
 [4] mime_0.4                     GenomeInfoDb_1.4.2           R6_2.1.1                    
 [7] xtable_1.7-4                 DBI_0.3.1                    stats4_3.2.2                
[10] magrittr_1.5                 RSQLite_1.0.0                BiocInstaller_1.18.4        
[13] httr_1.0.0                   stringi_0.5-5                curl_0.9.3                  
[16] S4Vectors_0.6.5              tools_3.2.2                  stringr_1.0.0               
[19] Biobase_2.28.0               shiny_0.12.2                 httpuv_1.3.3                
[22] parallel_3.2.2               BiocGenerics_0.14.0          AnnotationDbi_1.30.1        
[25] htmltools_0.2.6              interactiveDisplayBase_1.6.0

ADD REPLY
0
Entering edit mode

OK I think the problem here is that httr maintains a cache of connections, and the connection to AnnotationHub has become stale. I think the workaround is httr::handle_reset(paste0(hubUrl(), "/")); file.remove(dbfile(qhs)).

ADD REPLY
0
Entering edit mode

Hi Dan

Thank you for your help .

I retried after deleting the AnnotationHub cache directory which hubCache(ah) indicates.  

I've got below error message. 


> ah <- AnnotationHub()
retrieving 1 resources
  |====================================================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)
> ah <- subset(ah, species == "Homo sapiens")
> qhs <- query(ah, "RefSeq")
> qhs
AnnotationHub with 8 records
# snapshotDate(): 2015-08-26 
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH5040"]]' 

           title       
  AH5040 | RefSeq Genes
  AH5041 | Other RefSeq
  AH5155 | RefSeq Genes
  AH5156 | Other RefSeq
  AH5306 | RefSeq Genes
  AH5307 | Other RefSeq
  AH5431 | RefSeq Genes
  AH5432 | Other RefSeq
> refseq <- qhs[qhs$genome == "hg19" & qhs$title == "RefSeq Genes"]
> refseq
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26 
# names(): AH5040
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: RefSeq Genes
# $description: GRanges object from UCSC track 'RefSeq Genes'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene
# $sourcelastmodifieddate: NA
# $sourcesize: NA
# $tags: refGene, UCSC, track, Gene, Transcript, Annotation 
# retrieve record with 'object[["AH5040"]]' 
> refseq <- refseq[[1]]
retrieving 1 resources
Downloading: 73 B     
Error in value[[3L]](cond) : 
  failed to load hub resource ‘RefSeq Genes’ of class GRanges; reason: bad restore
    file magic number (file may be corrupted) -- no data loaded
In addition: There were 38 warnings (use warnings() to see them)

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] AnnotationHub_2.0.3  rtracklayer_1.28.10  GenomicRanges_1.20.6 GenomeInfoDb_1.4.2  
[5] IRanges_2.2.7        S4Vectors_0.6.5      BiocGenerics_0.14.0  BiocInstaller_1.18.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0                  AnnotationDbi_1.30.1         XVector_0.8.0               
 [4] magrittr_1.5                 zlibbioc_1.14.0              GenomicAlignments_1.4.1     
 [7] BiocParallel_1.2.21          xtable_1.7-4                 R6_2.1.1                    
[10] stringr_1.0.0                httr_1.0.0                   tools_3.2.1                 
[13] Biobase_2.28.0               DBI_0.3.1                    lambda.r_1.1.7              
[16] futile.logger_1.4.1          htmltools_0.2.6              digest_0.6.8                
[19] interactiveDisplayBase_1.6.0 shiny_0.12.2                 futile.options_1.0.0        
[22] bitops_1.0-6                 curl_0.9.3                   RCurl_1.95-4.7              
[25] mime_0.4                     RSQLite_1.0.0                stringi_0.5-5               
[28] Biostrings_2.36.4            Rsamtools_1.20.4             XML_3.98-1.3                
[31] httpuv_1.3.3      

Smiles

ADD REPLY
0
Entering edit mode

We have still not been able to reproduce these problems. I've tried on a Windows 7 VM.

One thought that occurs is that maybe your disk is full? I know that it seems the problem is happening on multiple machines (all running windows) so that is not likely to be the explanation, but it should be ruled out. The file you are trying to download is larger than 73B.

If that is not the case, then there is something weird with the download and we will continue to investigate.

ADD REPLY

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6