Question

Using genomicRanges / plyranges to calculate score of peaks within intervals

0

Entering edit mode

alorsonmethyle • 0

@alorsonmethyle-16837

Last seen 3.9 years ago

Norway

Hi there

I'd like to know whether a straightforward way existed by using GenomicRanges and maybe plyranges to group peaks of a dataset (gr.unique) into the tiles of another one (tiles)

Here is my code and my attempts

df_chr22_meth = data.frame(seqname=c('chr22'),
start=c(1,10,100,1,10,100),
 end=c(1,10,100,1,10,100),
 strand=c('+','+','+','-','-','-'), 
score=c(30,33,32,90,95,98))

gr.unique = makeGRangesFromDataFrame(df_chr22_meth, keep.extra.columns = T)

tiles=tile(range(df_chr22_meth_fake),width = 10)

grouping = unlist(tiles) %>% group_by_overlaps(gr.unique)

grouping %>% mutate(mean_o = mean(score))

GRanges object with 12 ranges and 3 metadata columns:
Groups: query [4]
       seqnames    ranges strand |     score     query    mean_o
          <Rle> <IRanges>  <Rle> | <numeric> <integer> <numeric>
   [1]    chr22      1-10      + |        30         1        62
   [2]    chr22      1-10      + |        90         1        62
   [3]    chr22      1-10      + |        33         1        62
   [4]    chr22      1-10      + |        95         1        62
   [5]    chr22    91-100      + |        32        10        65
   ...      ...       ...    ... .       ...       ...       ...
   [8]    chr22      1-10      - |        90        11        62
   [9]    chr22      1-10      - |        33        11        62
  [10]    chr22      1-10      - |        95        11        62
  [11]    chr22    91-100      - |        32        20        65
  [12]    chr22    91-100      - |        98        20        65
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

## it actually combines them but lost the strand in the process without me finding a way to avoid it !

sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] plyranges_1.10.0                  stringr_1.4.0                     BSgenome.Hsapiens.UCSC.hg38_1.4.3
 [4] BSgenome_1.58.0                   rtracklayer_1.49.5                Biostrings_2.58.0                
 [7] XVector_0.30.0                    dplyr_1.0.5                       GenomicRanges_1.42.0             
[10] GenomeInfoDb_1.26.7               IRanges_2.24.1                    S4Vectors_0.28.1                 
[13] BiocGenerics_0.36.1              

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.20.0 tidyselect_1.1.1            purrr_0.3.4                 lattice_0.20-41            
 [5] vctrs_0.3.7                 generics_0.1.0              expm_0.999-6                utf8_1.2.1                 
 [9] XML_3.99-0.6                rlang_0.4.11                e1071_1.7-6                 pillar_1.6.0               
[13] glue_1.4.2                  DBI_1.1.1                   BiocParallel_1.24.1         matrixStats_0.58.0         
[17] GenomeInfoDbData_1.2.4      rootSolve_1.8.2.1           lifecycle_1.0.0             zlibbioc_1.36.0            
[21] MatrixGenerics_1.2.1        mvtnorm_1.1-1               Biobase_2.50.0              lmom_2.8                   
[25] class_7.3-18                fansi_0.4.2                 Rcpp_1.0.6                  DelayedArray_0.16.3        
[29] Rsamtools_2.6.0             gld_2.6.2                   Exact_2.1                   stringi_1.5.3              
[33] grid_4.0.5                  tools_4.0.5                 bitops_1.0-7                magrittr_2.0.1             
[37] DescTools_0.99.41           RCurl_1.98-1.3              proxy_0.4-25                tibble_3.1.1               
[41] crayon_1.4.1                pkgconfig_2.0.3             MASS_7.3-53.1               ellipsis_0.3.1             
[45] Matrix_1.3-2                data.table_1.14.0           assertthat_0.2.1            rstudioapi_0.13            
[49] R6_2.5.0                    boot_1.3-27                 GenomicAlignments_1.26.0    compiler_4.0.5

I did find a way from there : summarize scores of GRanges into bins I was just feeling that it was fairly convoluted for a basic operation and that it might now exists better ways ?

thanks a lot

plyranges GenomicRanges • 1.1k views

ADD COMMENT • link updated 10 weeks ago by Michael Love 43k • written 3.9 years ago by alorsonmethyle • 0

score 0 · Answer 1 · 2025-02-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

This should be possible with plyranges using join_overlap and group_by on the metadata from the tile. E.g. create tile.id as a variable in mcols(tiles) and then have the tiles used as the y in the join.

ADD COMMENT • link 10 weeks ago Michael Love 43k