Merging Neighboring Genomic Segments
1
0
Entering edit mode
rahnaman ▴ 10
@user-24168
Last seen 4.1 years ago

Hello All,

I have a GRanges object that contains the coordinates of copy number variation regions in my data, and was wondering if there is already an associated GRanges function that allow me to merge segments if they are close enough (for example < =1 kb apart).

Below is a simplistic example of such data that. What I would like to have is a list that combines regions b-c on chr2 (because there are 1 unit apart) and similarly e-f-g on chr3.

Are there any existing GRanges functions for this?

Thanks for your time in advance.

> GRanges(seqnames = Rle(c("chr1", "chr2","chr3"), c(1, 3,3)), 
         ranges = IRanges(c(1,1,4,9,1,4,6), 
         end = c(2,3,7,12,3,5,10), names = head(letters, 7)))

GRanges object with 7 ranges and 0 metadata columns:
    seqnames    ranges strand
       <Rle> <IRanges>  <Rle>
  a     chr1       1-2      *
  b     chr2       1-3      *
  c     chr2       4-7      *
  d     chr2      9-12      *
  e     chr3       1-3      *
  f     chr3       4-5      *
  g     chr3      6-10      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
Granges GenomicRanges • 2.7k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

Not directly, so far as I know. But if you widen the intervals by, in this case, one base, and then use reduce you get what you want.

> z <- GRanges(seqnames = Rle(c("chr1", "chr2","chr3"), c(1, 3,3)), 
         ranges = IRanges(c(1,1,4,9,1,4,6), 
         end = c(2,3,7,12,3,5,10), names = head(letters, 7)))

> reduce(resize(z, width(z + 1), "start"))
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-4      *
  [2]     chr2      1-14      *
  [3]     chr3      1-12      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

Here I assume you should always add on one side, however you get negative ranges if for example the sequences are on the negative strand.

> z <- GRanges(seqnames = Rle(c("chr1", "chr2","chr3"), c(1, 3,3)), 
         ranges = IRanges(c(1,1,4,9,1,4,6), 
         end = c(2,3,7,12,3,5,10)), names = head(letters, 7), strand = rep("-", 7))
> z
GRanges object with 7 ranges and 1 metadata column:
      seqnames    ranges strand |       names
         <Rle> <IRanges>  <Rle> | <character>
  [1]     chr1       1-2      - |           a
  [2]     chr2       1-3      - |           b
  [3]     chr2       4-7      - |           c
  [4]     chr2      9-12      - |           d
  [5]     chr3       1-3      - |           e
  [6]     chr3       4-5      - |           f
  [7]     chr3      6-10      - |           g
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> reduce(resize(z, width(z + 1), "start"))
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1      -1-2      -
  [2]     chr2     -1-12      -
  [3]     chr3     -1-10      -
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

So there might be a more sophisticated way to do it.

ADD COMMENT
0
Entering edit mode

Hah! I was right...

> reduce(z, min.gapwidth = 2L)
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-2      -
  [2]     chr2      1-12      -
  [3]     chr3      1-10      -
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
ADD REPLY
0
Entering edit mode

This was super useful and did the trick for my analysis. Thanks a lot James!

ADD REPLY

Login before adding your answer.

Traffic: 512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6