Hi
I'm trying to understand why there's a length difference in the the total sum of width between these two code :
library(AnnotationHub)
require('rtracklayer')
library(BSgenome)
ah= AnnotationHub()
H3K27me3_r <- query(ah, c("H3K27me3","E003","fc.signal"))[["AH32033"]]
gr2 <- GRanges(seqnames ="chr2", ranges=IRanges(start=start(Hsapiens$chr2), end=end(Hsapiens$chr2)))
#first measure
H3K27me3_r_gr_byrle <- import(H3K27me3_r, which = gr2, as = "Rle")
sum(runLength(H3K27me3_r_gr_byrle$chr2))
#Second measure
H3K27me3_r_gr <- import(H3K27me3_r, which = gr2)
sum(width(H3K27me3_r_gr))
From my understanding : the first exemple; importing as RLE must generate a list of value vs length and will aggregate repeated values in the length tags. Thus, sum(length) must reflect the total of the score occurence (in relation with the IRanges).
The second example is not importing as RLE, thus, i will have a Grange with the IRanges and the respectives scores. In conclusion, I suppose that both measure must be the same, but it's not the case :
1st example ; sum(runLength(H3K27me3_r_gr_byrle$chr2)) = 243199373
2nd example : sum(width(H3K27me3_r_gr))= 243185735
Why this slight difference ?
Here's my sessionInfo()
R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
LC_TIME=French_France.1252attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 BSgenome.Hsapiens.UCSC.hg19_1.4.3 GenomicFeatures_1.46.5
AnnotationDbi_1.56.2 [5] Biobase_2.54.0
Rsamtools_2.10.0 BSgenome_1.62.0
rtracklayer_1.54.0 [9] GenomicRanges_1.46.1
Biostrings_2.62.0 GenomeInfoDb_1.30.1
XVector_0.34.0 [13] IRanges_2.28.0
S4Vectors_0.32.4 AnnotationHub_3.2.2
BiocFileCache_2.2.1 [17] dbplyr_2.2.1
BiocGenerics_0.40.0loaded via a namespace (and not attached): [1] MatrixGenerics_1.6.0
httr_1.4.3 bit64_4.0.5
shiny_1.7.2 assertthat_0.2.1
interactiveDisplayBase_1.32.0 [7] BiocManager_1.30.18
blob_1.2.3 GenomeInfoDbData_1.2.7 yaml_2.3.5 progress_1.2.2 BiocVersion_3.14.0 [13] pillar_1.8.0 RSQLite_2.2.15
lattice_0.20-45 glue_1.6.2
digest_0.6.29 promises_1.2.0.1 [19] htmltools_0.5.3 httpuv_1.6.5
Matrix_1.4-1 XML_3.99-0.10
pkgconfig_2.0.3 biomaRt_2.50.3 [25] zlibbioc_1.40.0 purrr_0.3.4
xtable_1.8-4 later_1.3.0
BiocParallel_1.28.3 tibble_3.1.8 [31] KEGGREST_1.34.0 generics_0.1.3
ellipsis_0.3.2 cachem_1.0.6
SummarizedExperiment_1.24.0 cli_3.3.0 [37] magrittr_2.0.3 crayon_1.5.1 mime_0.12 memoise_2.0.1 fansi_1.0.3 xml2_1.3.3 [43] tools_4.1.0 prettyunits_1.1.1
hms_1.1.1 BiocIO_1.4.0
lifecycle_1.0.1 matrixStats_0.62.0 [49] stringr_1.4.0 DelayedArray_0.20.0
compiler_4.1.0 rlang_1.0.4 grid_4.1.0 RCurl_1.98-1.8 [55] rjson_0.2.21
rappdirs_0.3.3 bitops_1.0-7
restfulr_0.0.15 DBI_1.1.3 curl_4.3.2 [61] R6_2.5.1 GenomicAlignments_1.30.0
dplyr_1.0.9 fastmap_1.1.0 bit_4.0.4 utf8_1.2.2 [67] filelock_1.0.2
stringi_1.7.6 parallel_4.1.0 Rcpp_1.0.9 vctrs_0.4.1 png_0.1-7 [73] tidyselect_1.1.2
Thanks a lot. Lenny
Oh ! these are the 13638 that appears 0 times ! thanks a lot it helps me to understand better RLE :)