Create a matrix to show overlaps among multiple GRanges
1
1
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 17 months ago
Brazil

I am trying to find a way to efficiently extract a matrix showing '0' or '1' when comparing different GRange objects. In my example:

gr.1 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
gr.2 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
gr.3 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)

I tried findOverlaps to evaluate the overlaps among these regions but apparently it can't deal with more than two GRanges:

> GenomicRanges::findOverlaps(gr.1, gr.2, gr.3)
> Error in IRanges:::NCList_find_overlaps_in_groups(ranges(query),
> q_space,  :    'maxgap' must be a single integer

Moreover, my required output would be something like this example data-frame:

out <- "gr.1 gr.2 gr.3
chr1-1 1  0  1
chr1-2 1  1  0
chr10-3 0 1  0
chr10-4 1 1  0"

out <- read.table(text=out, header=TRUE)

Any idea to wisely export it using mainly functions from bioconductor packages?

r GRanges GenomicRanges • 1.6k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Do note that you cannot really have a matrix showing the overlap between the three GRanges objects, because there will be different regions defined in the three GRanges. Instead, you could define a common set of genomic ranges and then make an incidence matrix showing if a given range in a particular GRanges object overlaps.

> gr.1 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
> gr.2 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
> gr.3 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)

> base <- reduce(c(gr.1, gr.2, gr.3))
> base
GRanges object with 2096 ranges and 0 metadata columns:
                seqnames          ranges strand
                   <Rle>       <IRanges>  <Rle>
     [1]            chr1  729836-1633109      *
     [2]            chr1 1946672-2419469      *
     [3]            chr1 2952076-3463275      *
     [4]            chr1 3836898-4273778      *
     [5]            chr1 4931353-6065223      *
     ...             ...             ...    ...
  [2092]   chr6_qbl_hap6 3481669-4494167      *
  [2093]  chr6_ssto_hap7  717078-1631256      *
  [2094]  chr6_ssto_hap7 3139239-3635135      *
  [2095] chr17_ctg5_hap1   154779-591672      *
  [2096] chr17_ctg5_hap1  978830-1633449      *
  -------
  seqinfo: 93 sequences from an unspecified genome; no seqlengths

> z <- matrix(0,length(base),3)
> z[subjectHits(findOverlaps(gr.1, base)),1] <- 1
> z[subjectHits(findOverlaps(gr.2, base)),2] <- 1
> z[subjectHits(findOverlaps(gr.3, base)),3] <- 1
> colnames(z) <- paste0("gr.", 1:3)
> mcols(base) <- z
> base
GRanges object with 2096 ranges and 3 metadata columns:
                seqnames          ranges strand |      gr.1      gr.2      gr.3
                   <Rle>       <IRanges>  <Rle> | <numeric> <numeric> <numeric>
     [1]            chr1  729836-1633109      * |         0         1         1
     [2]            chr1 1946672-2419469      * |         0         1         0
     [3]            chr1 2952076-3463275      * |         0         1         0
     [4]            chr1 3836898-4273778      * |         0         1         0
     [5]            chr1 4931353-6065223      * |         0         1         1
     ...             ...             ...    ... .       ...       ...       ...
  [2092]   chr6_qbl_hap6 3481669-4494167      * |         1         1         0
  [2093]  chr6_ssto_hap7  717078-1631256      * |         1         1         0
  [2094]  chr6_ssto_hap7 3139239-3635135      * |         0         0         1
  [2095] chr17_ctg5_hap1   154779-591672      * |         0         0         1
  [2096] chr17_ctg5_hap1  978830-1633449      * |         1         1         0
  -------
  seqinfo: 93 sequences from an unspecified genome; no seqlengths

Is, I believe, pretty close to what you want?

ADD COMMENT

Login before adding your answer.

Traffic: 642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6