Question

aggregate bins in large SummarizedExperiment efficiently

0

Entering edit mode

alessandro.pastore ▴ 20

@alessandropastore-10879

Last seen 6.4 years ago

I have a SummarizedExperiment (but we can consider it a GRanges). What I want is to reduce the number of intervals, keeping only one row for every identical adjacent mcol(gr), important is to also keep track of the new extend interval.

if a state pair is present also in non adjacent intervals (this second e.g. 1,1 pair has to be report independely from the first)

Thanks a lot!

gr <- GRanges(
seqnames = Rle(c("chr1"), c(12)),
ranges = IRanges(1:12*10, end = 1:12*10+5),
state1 = c(1,1,1,1,2,3,4,5,5,5,1,1),
state2 = c(1,1,1,2,2,2,5,5,6,6,1,1))

must became:

gr2 <- GRanges(
  seqnames = Rle(c("chr1"), c(8)),
  ranges = IRanges(start = c(10,40,50,60,70,80,90,110), end = c(35,45,55,65,75,85,105,125)),
  state1 = c(1,1,2,3,4,5,5,1),
  state2 = c(1,2,2,2,5,5,6,1))

granges summarizedexperiment reduce • 1.5k views

ADD COMMENT • link updated 8.6 years ago by Michael Lawrence ★ 11k • written 8.6 years ago by alessandro.pastore ▴ 20

score 3 · Accepted Answer · 2016-09-09

3

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.4 years ago

United States

It can be done like this:

r <- disjoin(c(ranges(Rle(gr$state1)), ranges(Rle(gr$state2))))
grl <- relist(gr, PartitioningByEnd(r))
grr <- unlist(range(grl), use.names=FALSE)
mcols(grr) <- mcols(unlist(phead(grl, 1L), use.names=FALSE))

That is admittedly pretty ugly. Sorry about that.

ADD COMMENT • link 8.6 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks this is nice, but it only list the first and last occurrence of a state pair, not if a state pair occur more than one time.

the pair 1,1 occurs at the begin and at the end of the GRanges but in you case is listed as occurring in a interval that span the whole GRanges. I need both interval one for each occurrence of adjacent values