Reducing and aggregating GRanges with gaps using plyranges
0
0
Entering edit mode
@rapolicastro-24308
Last seen 4.1 years ago

Hello all,

I am trying to use plyranges to do a strand specific reduction and aggregation of score with gaps allowed. This is conceptually similar to section 4.1 in the HelloRanges tutorial, except with using the max.gapwidth argument in GenomicRanges::reduce. The closest function I see in plyranges is reduce_ranges_directed, but this does not allow gaps.

This question was posted yesterday in Biostars as well, but a consensus could not be reached on a best method.

Example data.

library("plyranges")

df <- data.frame(
  seqnames="chrI", start=c(1, 10, 20), end=c(5, 15, 25), strand=c("+", "+", "-"),
  score=c(8, 3, 6)
)
gr <- as_granges(df)

> gr
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <integer>
  [1]     chrI       1-5      + |         8
  [2]     chrI     10-15      + |         3
  [3]     chrI     20-25      - |         6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Example of desired output with max allowed gap width of 10 and summing the scores for the aggregation in this example.

desired_output <- data.frame(
  seqnames="chrI", start=c(1, 20), end=c(15, 25), strand=c("+", "-"),
  score=c(11, 6)
)
desired_output <- as_granges(desired_output)

> desired_output
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |     score
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chrI      1-15      + |        11
  [2]     chrI     20-25      - |         6
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Cheers!

GenomicRanges plyranges • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6