Splitting a GRanges object into List of GRange objects by position.
1
1
Entering edit mode
@richardjacton-12268
Last seen 4.2 years ago

I have a GRanges object and I would like to get a list of GRange objects which are subsets of the original object, each of the the objects in this list should have the same position (same chr, start and end values).

 

library(GenomicRanges)
# input:
gr=GRanges(seqnames=c("chr1","chr1","chr2","chr2"),
           ranges=IRanges(start=c(50,50,200,200),end=c(100,100,300,300)),
           strand=c("+","+","-","-"),
           scores=c(0.5,0.5,0.5,0.5)
)
gr
>GRanges object with 4 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr1 [ 50, 100]      + |       0.5
>  [2]     chr1 [ 50, 100]      + |       0.5
>  [3]     chr2 [200, 300]      - |       0.5
>  [4]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 2 sequences from an unspecified genome; no seqlengths

# the desired output is an object of the form of ls
gr1=GRanges(seqnames=c("chr1","chr1"),
           ranges=IRanges(start=c(50,50),end=c(100,100)),
           strand=c("+","+")
)
gr1
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>            
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

gr2=GRanges(seqnames=c("chr2","chr2"),
           ranges=IRanges(start=c(200,200),end=c(300,300)),
           strand=c("-","-"),
           scores=c(0.5,0.5)
)
gr2
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

ls <- list(gr1,gr2)
ls
>[[1]]
>GRanges object with 2 ranges and 0 metadata columns:
>      seqnames    ranges strand
>            
>  [1]     chr1 [50, 100]      +
>  [2]     chr1 [50, 100]      +
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
>[[2]]
>GRanges object with 2 ranges and 1 metadata column:
>      seqnames     ranges strand |    scores
>              | 
>  [1]     chr2 [200, 300]      - |       0.5
>  [2]     chr2 [200, 300]      - |       0.5
>  -------
>  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

For example in the excerpt from the GRanges object above the first 2 lines should be in a new GRanges object in a list of GRange objects and the last 2 lines in a second GRanges object in that list.

Any suggestions?

R granges subsett grangeslist group • 5.8k views
ADD COMMENT
3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 15 hours ago
EMBL Heidelberg

How about this:

split(gr, as.factor(gr))
>
GRangesList object of length 2:
$chr1:50-100:+ 
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |    scores
         <Rle> <IRanges>  <Rle> | <numeric>
  [1]     chr1 [50, 100]      + |       0.5
  [2]     chr1 [50, 100]      + |       0.5

$chr2:200-300:- 
GRanges object with 2 ranges and 1 metadata column:
      seqnames     ranges strand | scores
  [1]     chr2 [200, 300]      - |    0.5
  [2]     chr2 [200, 300]      - |    0.5

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths

You can even do it without the as.factor() part and it seems to work, but I get a warning so maybe it's safer to use the first version.

ADD COMMENT
0
Entering edit mode

Yea I noticed the warning and I'm working on it. In theory, we could convert GRanges to a grouping more efficiently than the character coercion, using the 4-way integer hash.

ADD REPLY
0
Entering edit mode

Thanks Mike,

That seems to do exactly what I wanted.

I'm not sure I understand how though.

looking at the output of `as.factor(gr)` I can see why it works with as.factor but not why it would work without?

I had not expected that behaviour from as.factor as applied to a GRanges object, thought the presence of metadata might cause issues but it seems to be working on the IRanges bit, which makes some sense in retrospect. This is a very useful behaviour to be aware of.

ADD REPLY

Login before adding your answer.

Traffic: 358 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6