integer overflow in ChIPpeakAnno assignChromosomeRegion
3
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…

Hi there,

I'm just starting to explore the ChIPpeakAnno package - it looks really useful: thank you for creating it.  I'm following some of the vignettes and using assignChromosomeRegion to get a first glimpse of where our peaks tend to map: that works well.   What I'd love to do is to see how different the observed distribution is from what we might expect if we sampled the genome randomly.  It seems like I could try simply using the same function on the whole (human) genome, with the nucleotideLevel=TRUE setting. 

But the human genome is too big - I get an integer overflow warning, and NAs in the output. I've pasted my code below - I think you'll see what I mean. Would it be possible to use as.numeric within the function to avoid this issue?

I'm also trying to understand what the Jaccard index metric is telling me, as it seems like it could be useful - I see an older email thread on this site but am having trouble parsing through the conversation to understand. Would it be possible to provide a little more explanation of the jaccard index in the help page for "?assignChromosomeRegion"?  (and is it an error that the help page currently refers to the picard index?)

thanks very much,

Janet

------------------------------------------------------------------- 

Dr. Janet Young 

Malik lab
http://research.fhcrc.org/malik/en.html

Division of Basic Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., A2-025, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 4512
email: jayoung  ...at...  fredhutch.org

------------------------------------------------------------------- 

 

library(ChIPpeakAnno)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

wholeGenome_GR <- GRanges(seqnames=names(seqlengths(BSgenome.Hsapiens.UCSC.hg19)), ranges=IRanges(start=1, end=seqlengths(BSgenome.Hsapiens.UCSC.hg19)) )

temp_aCR_wholeGenome <- assignChromosomeRegion(wholeGenome_GR, nucleotideLevel=TRUE, 
                           precedence=c("Promoters", "immediateDownstream", 
                                         "fiveUTRs", "threeUTRs", 
                                         "Exons", "Introns"), 
                           TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene)

# Warning messages:
# 1: In sum(width(queryHits[!duplicated(queryHits)])) :
#   integer overflow - use sum(as.numeric(.))
# 2: In sum(width(.ele)) : integer overflow - use sum(as.numeric(.))

temp_aCR_wholeGenome
# $percentage
#           Promoters immediateDownstream            fiveUTRs           threeUTRs               Exons             Introns 
#                  NA                  NA                  NA                  NA                  NA                  NA 
#   Intergenic.Region 
#                  NA 
# $jaccard
#           Promoters immediateDownstream            fiveUTRs           threeUTRs               Exons             Introns 
#         0.124492691         0.106314952         0.116458243         0.063211753         0.534006369         0.508907222 
#   Intergenic.Region 
#         0.000134607 

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3                  AnnotationDbi_1.40.0                   
 [4] Biobase_2.38.0                          BSgenome.Hsapiens.UCSC.hg19_1.4.0       BSgenome_1.46.0                        
 [7] rtracklayer_1.38.3                      ChIPpeakAnno_3.12.5                     VennDiagram_1.6.18                     
[10] futile.logger_1.4.3                     GenomicRanges_1.30.1                    GenomeInfoDb_1.14.0                    
[13] Biostrings_2.46.0                       XVector_0.18.0                          IRanges_2.12.0                         
[16] S4Vectors_0.16.0                        BiocGenerics_0.24.0                    

loaded via a namespace (and not attached):
 [1] httr_1.3.1                    idr_1.2                       RMySQL_0.10.13                regioneR_1.10.0              
 [5] bit64_0.9-7                   AnnotationHub_2.10.1          splines_3.4.3                 shiny_1.0.5                  
 [9] assertthat_0.2.0              interactiveDisplayBase_1.16.0 RBGL_1.54.0                   blob_1.1.0                   
[13] GenomeInfoDbData_0.99.0       Rsamtools_1.30.0              yaml_2.1.16                   progress_1.1.2               
[17] pillar_1.1.0                  RSQLite_2.0                   lattice_0.20-35               limma_3.34.8                 
[21] digest_0.6.15                 htmltools_0.3.6               httpuv_1.3.5                  Matrix_1.2-12                
[25] XML_3.98-1.9                  pkgconfig_2.0.1               biomaRt_2.34.2                zlibbioc_1.24.0              
[29] xtable_1.8-2                  GO.db_3.4.1                   BiocParallel_1.12.0           tibble_1.4.2                 
[33] AnnotationFilter_1.2.0        SummarizedExperiment_1.8.1    lazyeval_0.2.1                survival_2.41-3              
[37] magrittr_1.5                  mime_0.5                      memoise_1.1.0                 MASS_7.3-48                  
[41] graph_1.56.0                  BiocInstaller_1.28.0          tools_3.4.3                   prettyunits_1.0.2            
[45] matrixStats_0.53.1            stringr_1.2.0                 DelayedArray_0.4.1            ensembldb_2.2.1              
[49] lambda.r_1.2                  ade4_1.7-10                   compiler_3.4.3                rlang_0.1.6                  
[53] RCurl_1.95-4.10               bitops_1.0-6                  multtest_2.34.0               DBI_0.7                      
[57] curl_3.1                      R6_2.2.2                      GenomicAlignments_1.14.1      seqinr_3.4-5                 
[61] bit_1.1-12                    ProtGenerics_1.10.0           futile.options_1.0.0          stringi_1.1.6                
[65] Rcpp_0.12.15                 
ChIPpeakAnno assignChromosomeRegion • 1.6k views
ADD COMMENT
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 6 days ago
United States

Thank you for reporting the bug. I will fix it as soon as possible.

Jianhong.

ADD COMMENT
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…

Thanks, Jianhong - I appreciate it

ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States
Janet, Thanks for the feedback! The Jaccard index, also known as Intersection over Union. The Jaccard index is between 0 and 1. The higher the index, the more significant the overlap between the peak region and the genomic features in consideration. Best regards, Julie On Feb 14, 2018, at 5:07 PM, Janet Young [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Janet Young<https: support.bioconductor.org="" u="" 2360=""/> wrote Question: integer overflow in ChIPpeakAnno assignChromosomeRegion<https: support.bioconductor.org="" p="" 106018=""/>: Hi there, I'm just starting to explore the ChIPpeakAnno package - it looks really useful: thank you for creating it. I'm following some of the vignettes and using assignChromosomeRegion to get a first glimpse of where our peaks tend to map: that works well. What I'd love to do is to see how different the observed distribution is from what we might expect if we sampled the genome randomly. It seems like I could try simply using the same function on the whole (human) genome, with the nucleotideLevel=TRUE setting. But the human genome is too big - I get an integer overflow warning, and NAs in the output. I've pasted my code below - I think you'll see what I mean. Would it be possible to use as.numeric within the function to avoid this issue? I'm also trying to understand what the Jaccard index metric is telling me, as it seems like it could be useful - I see an older email thread on this site but am having trouble parsing through the conversation to understand. Would it be possible to provide a little more explanation of the jaccard index in the help page for "?assignChromosomeRegion"? (and is it an error that the help page currently refers to the picard index?) thanks very much, Janet ------------------------------------------------------------------- Dr. Janet Young Malik lab http://research.fhcrc.org/malik/en.html Division of Basic Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., A2-025, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 4512 email: jayoung ...at... fredhutch.org<http: fredhutch.org=""> ------------------------------------------------------------------- library(ChIPpeakAnno) library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) wholeGenome_GR <- GRanges(seqnames=names(seqlengths(BSgenome.Hsapiens.UCSC.hg19)), ranges=IRanges(start=1, end=seqlengths(BSgenome.Hsapiens.UCSC.hg19)) ) temp_aCR_wholeGenome <- assignChromosomeRegion(wholeGenome_GR, nucleotideLevel=TRUE, precedence=c("Promoters", "immediateDownstream", "fiveUTRs", "threeUTRs", "Exons", "Introns"), TxDb=TxDb.Hsapiens.UCSC.hg19.knownGene) # Warning messages: # 1: In sum(width(queryHits[!duplicated(queryHits)])) : # integer overflow - use sum(as.numeric(.)) # 2: In sum(width(.ele)) : integer overflow - use sum(as.numeric(.)) temp_aCR_wholeGenome # $percentage # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # NA NA NA NA NA NA # Intergenic.Region # NA # $jaccard # Promoters immediateDownstream fiveUTRs threeUTRs Exons Introns # 0.124492691 0.106314952 0.116458243 0.063211753 0.534006369 0.508907222 # Intergenic.Region # 0.000134607 sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.3 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 [4] Biobase_2.38.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.46.0 [7] rtracklayer_1.38.3 ChIPpeakAnno_3.12.5 VennDiagram_1.6.18 [10] futile.logger_1.4.3 GenomicRanges_1.30.1 GenomeInfoDb_1.14.0 [13] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 [16] S4Vectors_0.16.0 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] httr_1.3.1 idr_1.2 RMySQL_0.10.13 regioneR_1.10.0 [5] bit64_0.9-7 AnnotationHub_2.10.1 splines_3.4.3 shiny_1.0.5 [9] assertthat_0.2.0 interactiveDisplayBase_1.16.0 RBGL_1.54.0 blob_1.1.0 [13] GenomeInfoDbData_0.99.0 Rsamtools_1.30.0 yaml_2.1.16 progress_1.1.2 [17] pillar_1.1.0 RSQLite_2.0 lattice_0.20-35 limma_3.34.8 [21] digest_0.6.15 htmltools_0.3.6 httpuv_1.3.5 Matrix_1.2-12 [25] XML_3.98-1.9 pkgconfig_2.0.1 biomaRt_2.34.2 zlibbioc_1.24.0 [29] xtable_1.8-2 GO.db_3.4.1 BiocParallel_1.12.0 tibble_1.4.2 [33] AnnotationFilter_1.2.0 SummarizedExperiment_1.8.1 lazyeval_0.2.1 survival_2.41-3 [37] magrittr_1.5 mime_0.5 memoise_1.1.0 MASS_7.3-48 [41] graph_1.56.0 BiocInstaller_1.28.0 tools_3.4.3 prettyunits_1.0.2 [45] matrixStats_0.53.1 stringr_1.2.0 DelayedArray_0.4.1 ensembldb_2.2.1 [49] lambda.r_1.2 ade4_1.7-10 compiler_3.4.3 rlang_0.1.6 [53] RCurl_1.95-4.10 bitops_1.0-6 multtest_2.34.0 DBI_0.7 [57] curl_3.1 R6_2.2.2 GenomicAlignments_1.14.1 seqinr_3.4-5 [61] bit_1.1-12 ProtGenerics_1.10.0 futile.options_1.0.0 stringi_1.1.6 [65] Rcpp_0.12.15 ________________________________ Post tags: ChIPpeakAnno, assignChromosomeRegion You may reply via email or visit integer overflow in ChIPpeakAnno assignChromosomeRegion
ADD COMMENT

Login before adding your answer.

Traffic: 982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6