about setdiff() and meta-information in GRanges
1
0
Entering edit mode
Bogdan ▴ 670
@bogdan-2367
Last seen 14 months ago
Palo Alto, CA, USA

Dear all,

please, could you advise, is there any way to keep the meta-information of a GRanges, during a setdiff() operation, with another GRanges. More precisely, shall we work with :

> cnv_list
GRanges object with 1081 ranges and 1 metadata column:
         seqnames                 ranges strand |        name
            <Rle>              <IRanges>  <Rle> | <character>
     [1]     chr1      [     2,  140000]      * |         CN4
     [2]     chr1      [140002,  180000]      * |       CN128
     [3]     chr1      [340002,  530000]      * |        CN32
     [4]     chr1      [580002,  770000]      * |         CN6
     [5]     chr1      [900002, 1800000]      * |         CN3
     ...      ...                    ...    ... .         ...
  [1077]     chrX [154150002, 154260000]      * |        CN32
  [1078]     chrX [154260002, 154560000]      * |         CN3
  [1079]     chrX [154560002, 154650000]      * |         CN7
  [1080]     chrX [155340002, 155380000]      * |       CN128
  [1081]     chrX [155460002, 155500000]      * |        CN64
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

and

> centromeres_1
GRanges object with 109 ranges and 1 metadata column:
        seqnames                 ranges strand |        name
           <Rle>              <IRanges>  <Rle> | <character>
    [1]     chr1 [122503248, 124785432]      * |  GJ212202.1
    [2]     chr1 [122026460, 122224535]      * |  GJ211836.1
    [3]     chr1 [122224636, 122503147]      * |  GJ211837.1
    [4]     chr1 [124849230, 124932724]      * |  GJ211857.1
    [5]     chr1 [124785533, 124849129]      * |  GJ211855.1
    ...      ...                    ...    ... .         ...
  [105]    chr22   [13254953, 13258197]      * |  GJ212168.2
  [106]    chr22   [13258298, 13280858]      * |  GJ212169.2
  [107]    chr22   [14419555, 14419894]      * |  GJ212208.1
  [108]    chr22   [14419995, 14420334]      * |  GJ212209.1
  [109]    chr22   [14420435, 14421632]      * |  GJ212186.2
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

I would like to keep the meta-information of "cnv_list", during the operation :

results <- GenomicRanges::setdiff(cnv_list, centromeres_1, ignore.strand = TRUE)

many thanks !

-- bogdan

granges genomicranges • 2.1k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 5 days ago
Seattle, WA, United States

Hi Bogdan,

It looks like cnv_list is made of successive ranges separated by non-empty gaps. Because of this property the ranges in setdiff(cnv_list, centromeres_1) are guaranteed to be mapped to a unique range in cnv_list. This mapping from output to input (sometimes called reverse mapping) is a many-to-one mapping that can be computed with:

revmap <- findOverlaps(results, cnv_list, select="arbitrary")

and can then be used to propagate the metadata from cnv_list to results with:

mcols(results) <- mcols(cnv_list)[revmap, , drop=FALSE]

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Dear Herve, many thanks for your help on this question. It is working very well now ;)

ADD REPLY

Login before adding your answer.

Traffic: 399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6