Entering edit mode
Hi there
I'd like to know whether a straightforward way existed by using GenomicRanges and maybe plyranges to group peaks of a dataset (gr.unique) into the tiles of another one (tiles)
Here is my code and my attempts
df_chr22_meth = data.frame(seqname=c('chr22'),
start=c(1,10,100,1,10,100),
end=c(1,10,100,1,10,100),
strand=c('+','+','+','-','-','-'),
score=c(30,33,32,90,95,98))
gr.unique = makeGRangesFromDataFrame(df_chr22_meth, keep.extra.columns = T)
tiles=tile(range(df_chr22_meth_fake),width = 10)
grouping = unlist(tiles) %>% group_by_overlaps(gr.unique)
grouping %>% mutate(mean_o = mean(score))
GRanges object with 12 ranges and 3 metadata columns:
Groups: query [4]
seqnames ranges strand | score query mean_o
<Rle> <IRanges> <Rle> | <numeric> <integer> <numeric>
[1] chr22 1-10 + | 30 1 62
[2] chr22 1-10 + | 90 1 62
[3] chr22 1-10 + | 33 1 62
[4] chr22 1-10 + | 95 1 62
[5] chr22 91-100 + | 32 10 65
... ... ... ... . ... ... ...
[8] chr22 1-10 - | 90 11 62
[9] chr22 1-10 - | 33 11 62
[10] chr22 1-10 - | 95 11 62
[11] chr22 91-100 - | 32 20 65
[12] chr22 91-100 - | 98 20 65
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
## it actually combines them but lost the strand in the process without me finding a way to avoid it !
sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyranges_1.10.0 stringr_1.4.0 BSgenome.Hsapiens.UCSC.hg38_1.4.3
[4] BSgenome_1.58.0 rtracklayer_1.49.5 Biostrings_2.58.0
[7] XVector_0.30.0 dplyr_1.0.5 GenomicRanges_1.42.0
[10] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[13] BiocGenerics_0.36.1
loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.20.0 tidyselect_1.1.1 purrr_0.3.4 lattice_0.20-41
[5] vctrs_0.3.7 generics_0.1.0 expm_0.999-6 utf8_1.2.1
[9] XML_3.99-0.6 rlang_0.4.11 e1071_1.7-6 pillar_1.6.0
[13] glue_1.4.2 DBI_1.1.1 BiocParallel_1.24.1 matrixStats_0.58.0
[17] GenomeInfoDbData_1.2.4 rootSolve_1.8.2.1 lifecycle_1.0.0 zlibbioc_1.36.0
[21] MatrixGenerics_1.2.1 mvtnorm_1.1-1 Biobase_2.50.0 lmom_2.8
[25] class_7.3-18 fansi_0.4.2 Rcpp_1.0.6 DelayedArray_0.16.3
[29] Rsamtools_2.6.0 gld_2.6.2 Exact_2.1 stringi_1.5.3
[33] grid_4.0.5 tools_4.0.5 bitops_1.0-7 magrittr_2.0.1
[37] DescTools_0.99.41 RCurl_1.98-1.3 proxy_0.4-25 tibble_3.1.1
[41] crayon_1.4.1 pkgconfig_2.0.3 MASS_7.3-53.1 ellipsis_0.3.1
[45] Matrix_1.3-2 data.table_1.14.0 assertthat_0.2.1 rstudioapi_0.13
[49] R6_2.5.0 boot_1.3-27 GenomicAlignments_1.26.0 compiler_4.0.5
I did find a way from there : summarize scores of GRanges into bins I was just feeling that it was fairly convoluted for a basic operation and that it might now exists better ways ?
thanks a lot