It is easy enough to extract genomic ranges and associated meta data from a GRanges
object.
library(GenomicRanges)
gr1 <- GRanges(
seqnames = Rle("chr1", 3),
ranges = IRanges(c(1, 2, 8), end = c(2, 5, 9)),
score = 1:3)
show(gr1)
## GRanges object with 3 ranges and 1 metadata column:
## seqnames ranges strand | score
## <Rle> <IRanges> <Rle> | <integer>
## [1] chr1 1-2 * | 1
## [2] chr1 2-5 * | 2
## [3] chr1 8-9 * | 3
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
ranges(gr1)
## IRanges object with 3 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## [1] 1 2 2
## [2] 2 5 4
## [3] 8 9 2
mcols(gr1)
## DataFrame with 3 rows and 1 column
## score
## <integer>
## 1 1
## 2 2
## 3 3
However, I would like to extract the metadata from a GRanges
object for each genomic coordinate in the ranges and insert a 0
for uncovered coordinates. One can achieve this by converting the GRanges
object to a data.frame
and filling in the metadata values.
df1 <- as.data.frame(gr1)
df1.expanded <- data.frame(chr = "chr1", coord = min(df1$start):(max(df1$end)-1), score = 0)
for(n in 1:nrow(df1.expanded)){
try(df1.expanded$score[n] <- df1$score[df1$start <= df1.expanded$coord[n] & df1$end > df1.expanded$coord[n]], silent = TRUE)
}
df1.expanded
## chr coord score
## 1 chr1 1 1
## 2 chr1 2 2
## 3 chr1 3 2
## 4 chr1 4 2
## 5 chr1 5 0
## 6 chr1 6 0
## 7 chr1 7 0
## 8 chr1 8 3
Is there an easier and more efficient way to do this?
What I want to do in the end is to correlate the metadata of two GRanges
objects in specific genome regions. For one of the GRanges
objects uncovered genomic ranges should get a metadata score 0
. Are there better ways to achieve this than transforming GRanges
objects to data.frames
and "expanding" them as shown above?
Btw, the plyranges package skips the scary RleList intermediate:
Thanks, Michael. That's very helpful to fill the uncovered regions. However, I'm still not sure how to get a
GRanges
object with all ranges having width 1. I would like to achieve this, as I have to plot and correlate several scores from different files at the single-nucleotide level.Thanks, Michael. The
GPos
containers seem very interesting. However, it seems like withGPos(<my GRanges object>)
my metadata columns are lost. Do you know a workaround?I sent an email to Herve about this 2 years ago but he never replied. It would be nice to do this:
Instead, probably need this:
Just to note that, in this comment https://support.bioconductor.org/p/122571/#122587 Robert means the intervals to be half-open, so that coverage and other GRanges functionality will not work; really these need to be closed intervals.