I have one GRanges object with thousands of score columns (tomap), and another with regions of interest and no metadata (roi). I am trying to map the max score from each column in tomap to the corresponding interval in roi.
I also want to retain the names of the score columns (in my real data these are meaningful names and not genralizable like score1, score2 etc...). I can do it for specific columns but am struggling to generalize it to every column.
Here is what I've got so far:
library(GenomicRanges)
tomap <- GRanges(
seqnames = Rle(c("chr1"), c(10)),
ranges = IRanges(1:10*10, end = 1:10*10+5),
score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))
roi <- GRanges(
seqnames = Rle(c("chr1"), c(5)),
ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))
hits <- findOverlaps(roi, tomap, ignore.strand = TRUE)
ans<-roi
mcols(ans) <- aggregate(tomap, hits, score1=max(score1), score2= max(score2))
ans
#GRanges object with 5 ranges and 3 metadata columns:
# seqnames ranges strand | grouping score1 score2
# <Rle> <IRanges> <Rle> | <ManyToManyGrouping> <numeric> <numeric>
# [1] chr1 22-31 * | 2,3 0.326366489753127 0.925836584065109
# [2] chr1 42-51 * | 4,5 0.92806151532568 0.897841389290988
# [3] chr1 62-71 * | 6,7 0.980487102875486 0.940743445185944
# [4] chr1 83-92 * | 8,9 0.798293181695044 0.381754550151527
# [5] chr1 101-110 * | 10 0.872806148370728 0.953412540955469
As you can see, this works when I specify each score column individually, but how do I do this for thousands of columns?
Thanks Michael, but how do you store a matrix as a single column in the GRanges? If I do:
scores<-cbind(score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))
and then:
tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5), score=scores)
I still get 5 metadata columns, and an error when running your code:
Error in FUN(X[[i]], ...) : Argument 'x' must be a matrix or a vector.
Add it as a column after construction.
Doing this still ends up with tomap having 5 metadata columns:
tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5)) mcols(tomap)<-scores
Then running:
aggregate(tomap, hits, max_scores=lapply(scores, colMaxs))
gives this error:Error in FUN(X[[i]], ...) : Argument 'dim' must be an integer vector of length two.
mcols(x)$scores <- mat
thanks, that does work to associate the matrix but then if I have more score columns than rows in
roi
then I get the errorError in DataFrame(by, stats) : different row counts implied by arguments
. It seems, the result is transposed with the result for interval 1 placed in column 1 vertically instead of horizontally. And it fails completely if there are not the same number of intervals and score columns.I edited by answer to include the call to
bindROWS()
but I'm not sure if it will work. Please provide a fully reproducible example.