hi Janet,
in october i asked to the list for a similar question and among other
things posted in
https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html
i got the following hint from Michael Lawrence that may be relevant to
your goal:
"If you want to query the RleList for single positions (in a GRanges)
and
get the result as an atomic vector, it's most efficient to use
VariantTools::extractCoverageForPositions. That function should
probably
live somewhere more general but anyway, it's a fast path for single
position RleList=>vector extraction."
that particular function, extractCoverageForPositions, was not
completely solving my needs and i wrote yet another one which also
takes
an RleList and a GRanges object as input. i paste it here and maybe
merging somehow the two of them you can satify what you need (watch
out
line-wrapping by email software):
rleGetValues <- function(rlelst, gr, summaryFun="mean",
coercionFun="as.numeric") {
summaryFun <- match.fun(summaryFun)
coercionFun <- match.fun(coercionFun)
seqlevels(gr) <- names(rlelst)
ord <- order(seqnames(gr))
ans <- numeric(length(gr))
startbyseq <- split(start(gr), seqnames(gr), drop=TRUE)
ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun),
use.names=FALSE)
whregions <- which(width(gr) > 1)
if (length(whregions) > 0) { ## regions comprising more than one
position are summarized by summaryFun()
startbyseq <- split(start(gr)[whregions],
seqnames(gr)[whregions],
drop=TRUE)
widthbyseq <- split(width(gr)[whregions],
seqnames(gr)[whregions],
drop=TRUE)
ans[whregions] <- unlist(mapply(function(r, p, w)
sapply(seq_along(p),
function(i, r, p, w)
summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])),
r, p, w),
rlelst[names(startbyseq)],
startbyseq, widthbyseq, SIMPLIFY=FALSE),
use.names=FALSE)
}
ans
}
with your example it works as follows:
library(GenomicRanges)
x <- Rle(10:1, 1:10)
y <- Rle(10:1, 1:10)
z <- RleList( chr1=x, chr2=y)
## an example GRanges object
myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) )
## subsetting an Rle using an IRanges object works, as expected:
z[["chr1"]] [ ranges(myRange) ]
integer-Rle of length 6 with 2 runs
Lengths: 1 5
Values : 7 6
mean(z[["chr1"]] [ ranges(myRange) ])
[1] 6.166667
## now using the function 'rleGetValues()'
rleGetValues(z, myRange)
[1] 6.166667
cheers,
robert.
On 12/4/13 1:53 AM, Janet Young wrote:
> Hi there,
>
> I'm playing around with coverage data generated outside of R,
planning to plot RNA-seq coverage for some genes we're interested in.
>
> I have a request - it'd be nice from my point of view if it were
possible to look at a small region an RleList (or CompressedRleList)
using a GRanges object to focus on that smaller region. Looks like I
can subset a single Rle object with an IRanges object, but I wonder if
this nice feature could be extended to GRanges objects?
>
> Some code is below to show what I'm trying to do.
>
> thanks veruy much,
>
> Janet
>
>
>
> library(GenomicRanges)
>
> ## an example RleList
> x <- Rle(10:1, 1:10)
> y <- Rle(10:1, 1:10)
> z <- RleList( chr1=x, chr2=y)
>
> ## an example GRanges object
> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15)
)
>
> ## subsetting an Rle using an IRanges object works, as expected:
> z[["chr1"]] [ ranges(myRange) ]
>
> ## but subsetting an RleList by GRanges object doesn't work
> z [ myRange ]
> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript
type
>
> sessionInfo()
>
> R Under development (unstable) (2013-11-06 r64163)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13
> [4] BiocGenerics_0.9.1
>
> loaded via a namespace (and not attached):
> [1] stats4_3.1.0
>
>
>
>
> -------------------------------------------------------------------
>
> Dr. Janet Young
>
> Malik lab
>
http://research.fhcrc.org/malik/en.html
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., A2-025,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>
> tel: (206) 667 4512
> email: jayoung ...at... fhcrc.org
>
> -------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]