I loaded the IRanges library
library(IRanges)
Here's a coverage vector of length 10, and a desired output length, in this case a 'stretch'
x <- c(2, 2, 0, 0 , 2, 2, 0, 0, 2, 2)
length.out <- 13
I replicated each element of x 13 times
x13 <- rep(x, each=length.out)
then calculated the mean of the resulting vector in windows of size 10, using a little trick involving how R represents a matrix
length.in <- length(x)
colMeans(matrix(x13, length.in))
or as a one-liner
> colMeans(matrix(rep(x, each=length.out), length.in))
[1] 2.0 2.0 1.2 0.0 0.0 1.6 2.0 1.6 0.0 0.0 1.2 2.0 2.0
This is our new coverage vector, which could be compressed as an Rle.
> Rle(colMeans(matrix(rep(x, each=length.out), length.in)))
numeric-Rle of length 13 with 9 runs
Lengths: 2 1 2 1 1 1 2 1 2
Values : 2 1.2 0 1.6 2 1.6 0 1.2 2
This approach works for 'stretch' as well as 'shrink'. It could be made more efficient by replicating / averaging using the greatest common denominator of the in and out lengths. It generates large intermediate vectors, and this could be prohibitive. The Rle approach to replication is to multiply the run lengths by the stretch factor
r <- Rle(x)
runLength(r) <- length.out * runLength(r)
Calculation of new means can be done using views
mean(Views(r, breakInChunks(length(r), length.in)))
This seems to be more memory efficient than the matrix approach. So in the end we end up with
stretch <- function(r, length.out=length(x))
{
stopifnot(length.out > 0)
r <- as(r, "Rle")
length.in <- length(r)
runLength(r) <- length.out * runLength(r)
Rle(mean(Views(r, breakInChunks(length(r), length.in))))
}
I'm not sure that is what you were looking for, or whether there is a more efficient way to do this.
I can't really understand what you want to do. Maybe present a simple example with a plain-old-vector?
Basically I want to shrink or expand a coverage range and its reads along the x axis. In the ascii example plots I showed above the first range is compressed into 1/2 its size. In the second example plot it is expanded to twice its size.
The motivation is that I have read sets for ranges of several different lengths and I want to normalize them to the same length so that I can better compare their coverages together (ie make a mean coverage graph). For example
Read Set 1 are reads overlapping range 1-1000
Read Set 2 are reads overlapping range 2000-2500
Goal: Resize Read Set 2 to length 1000 by expanding all reads along the range by x 2
I haven't been able to find a function that will allow me to do this for an integer rle.
Alternatively I could try to work with the GrangesList I use to generate the integer rle. There is a resize() function for GRanges but there are two issues. The first is that I haven't seen how to resize by a factor rather than a constant number. Other than maybe going through the reads one by one with
resize(readsRange1[x],resizeFactor*width(readsRange1[x]))
And when resizing the reads are anchored at the start or the end and so expand or compress in only one direction when resizing. Which I don't think is how they should behave if you were truly symmetrically resizing the range? If I could solve the first problem at least that would be dandy.
Hi,
Not sure what you mean by resizing the reads "one by one". You seem to imply the need to use a loop. However this is not necessary because the
width
argument can be a vector of the length ofx
. Alsoresize()
supportsfix="center"
for symmetrical resizing. Here is an example with an IRanges object:H.