I have a set of `GenomicRanges` and all I need is to expand it by certain distance on start and another distance on end (taking into account the strand). For example, from this set:
strand | start | end | |
1 | + | 100 | 110 |
2 | - | 200 | 220 |
3 | + | 300 | 330 |
expanding by 20 nt at start and by 30 nt at the end, and considering chromosome length as 350, I want to get this:
strand | start | end | |
1 | + | 80 | 140 |
2 | - | 170 | 240 |
3 | + | 280 | 350 |
Note that 1) for range #2 shift values are inversed because it's negative strand, 2) for range #3 resulting end position is 350 because it can't be longer than chromosome length.
Here is the code to generate this sample data:
mygr <- GRanges(seqnames = Rle(rep('chr1', 3)), ranges = IRanges(c(100, 200, 300), c(110, 220, 330)), strand = c('+','-','+'), seqlengths= c(chr1=350))
So far I wrote the following function:
grexpand <- function(inputGR, befLen=0, aftLen=0){ inputGR <- resize(inputGR, width = width(inputGR)+befLen, fix = 'end') inputGR <- resize(inputGR, width = width(inputGR)+aftLen, fix = 'start') return(trim(inputGR)); }
it does the job but I'm wondering if this functionality is already included in the package and I just don't know the right way to call it...
I want to do the same thing, like to include promoter regions (2kb upstream) to my genes. I wonder what is the reasons for not having this feature in GenomicRanges package until now? I was also trying to use `resize` function for this purpose. Luckily I found this thread.
Do note the
flank()
andpromoters()
functions. This case is very specific to wanting a different expansion on either side of the range. Most of the complication comes down not having a strand-specificshift()
. It would also be simpler if we did not have to worry about a negative upstream length. A typical use case of combining the promoter with the gene range would just bepunion(promoters(mygr), mygr)
. One could useresize()
there to expand on the other side. I could see a good argument forextend()
being added for these special cases. It would be generalized to ordinary Ranges objects and use "start" and "end" as argument names instead of "upstream" and "downstream". Then, I don't think there's a reason to keep the warning about how "*" strand ranges are treated.