A little late after the fact, I just noticed the "new" tileGenome function in GenomicRange - very nice. Thank you! I'll start using this now instead of a much slower function I'd written myself.
I sometimes find myself looking at overlapping sliding windows with my slow function, rather than the non-overlapping tiles that tileGenome produces. Would it be possible to add that option to the function? I'd like to specify window size and slide amount. In a slightly ridiculous toy example, I might want 100bp windows with a slide of 20bp on a 150bp chromosome, so the windows would be at these positions:
1-100
21-120
41-140
61-150
What do you think - would this be easy for you guys to do?
keeps the ends within chromosome bounds (avoiding the warning message from resize()) and for the more complicated GRangesList the 'unlist / relist` trick does the same
I think there would definitely be some value in enhancing tileGenome() to allow overlap or spacing between the tiles. This could be achieved via a spacing arg that would be 0 by default. When spacing is positive, say 2, and with tilewidth=20 one would get the following ranges:
1-20
23-42
45-64
etc...
When spacing is negative, say -2, one would get overlapping ranges:
1-20
19-38
37-56
etc...
So to get the tiles she wants, Janet would need to specify tilewidth=100 and spacing=-80.
Even though I anticipate that most of the time people will use a negative spacing, I prefer this to having the extra argument be called overlap that would be interpreted as the opposite of spacing (i.e. overlap=N means spacing=-N). Does this sound reasonable?
+1
On Sat, Feb 7, 2015 at 8:37 PM, Hervé Pagès [bioc] <noreply@bioconductor.org> wrote:
> Activity on a post you are following on support.bioconductor.org
>
> User Hervé Pagès <https: support.bioconductor.org="" u="" 1542=""/> wrote Comment:
> tileGenome for overlapping ranges?
> <https: support.bioconductor.org="" p="" 64708="" #64715="">:
>
> Hi,
>
> I think there would definitely be some value in enhancing tileGenome() to
> allow overlap or spacing between the tiles. This could be achieved via a
> spacing arg that would be 0 by default. When spacing is positive, say 2,
> and with tilewidth=20 one would get the following ranges:
>
> 1-20
> 23-42
> 45-64
> etc...
>
> When spacing is negative, say -2, one would get overlapping ranges:
>
> 1-20
> 19-38
> 37-56
> etc...
>
> So to get the tiles she wants, Janet would need to specify tilewidth=100
> and spacing=-80.
>
> Even though I anticipate that most of the time people will use a negative
> spacing, I prefer this to having the extra argument be called overlap that
> would interpreted as the opposite of spacing (i.e. overlap=N means
> spacing=-N). Does this sound reasonable?
>
> Thanks,
>
> H.
>
>
>
> ------------------------------
>
> You may reply via email or visit
> C: tileGenome for overlapping ranges?
>
I think I'd actually prefer the original suggestion of options for tilewidth and step, where step defaults to tilewidth. (Mathematically, step = spacing + tilewidth.) Or maybe support providing either spacing or step, in a similar manner to how Ranges support any two of start, end, and width? This is based on my experience that usually I say something like "I want windows of width X tiled every Y bp across the genome" and not "I want windows of width X with an overlap of Z tiled across the genome".
Given that tileGenome() allows the user to specify the number of tiles (ntile arg) instead of the tile width (tilewidth arg), s/he might also want to say "I want N windows with an overlap of Z tiled across the genome". In that case it can be hard for him/her to figure out what step to use. And vice-versa: if someone wants to say "I want N windows tiled every Y bp across the genome" it can be hard to express this in terms of spacing.
So I think we should probably have both spacing and step, as 2 exclusive args.
Thanks very much, all - it's nice to have a way to do it with existing code, and also great to see that it could be a built-in option for the function at some point soon. The built-in option will be much more intuitive for us naive biologists than the clever resizing method, which isn't immediately obvious.
Herve: yes, that idea does sound very reasonable. Perhaps it'd help people search for and intuitively understand the new option if the help page includes the phrases "sliding window" and "overlap", even if the option is called spacing - I think those might be the more commonly used names in publications, etc.
Just want to continue to illustrate the approach @Martin showed. The 6th range, which is at the end of the following snippet, may not be desirable. It is completely within the second last range.
For a simple GRanges like that from
I guess
keeps the ends within chromosome bounds (avoiding the warning message from resize()) and for the more complicated GRangesList the 'unlist / relist` trick does the same
Hi,
I think there would definitely be some value in enhancing
tileGenome()
to allow overlap or spacing between the tiles. This could be achieved via aspacing
arg that would be 0 by default. Whenspacing
is positive, say 2, and withtilewidth=20
one would get the following ranges:When
spacing
is negative, say -2, one would get overlapping ranges:So to get the tiles she wants, Janet would need to specify
tilewidth=100
andspacing=-80
.Even though I anticipate that most of the time people will use a negative spacing, I prefer this to having the extra argument be called
overlap
that would be interpreted as the opposite ofspacing
(i.e.overlap=N
meansspacing=-N
). Does this sound reasonable?Thanks,
H.
I think I'd actually prefer the original suggestion of options for
tilewidth
andstep
, wherestep
defaults totilewidth
. (Mathematically,step = spacing + tilewidth
.) Or maybe support providing eitherspacing
orstep
, in a similar manner to how Ranges support any two of start, end, and width? This is based on my experience that usually I say something like "I want windows of width X tiled every Y bp across the genome" and not "I want windows of width X with an overlap of Z tiled across the genome".Hi Ryan,
Interesting.
Given that
tileGenome()
allows the user to specify the number of tiles (ntile
arg) instead of the tile width (tilewidth
arg), s/he might also want to say "I want N windows with an overlap of Z tiled across the genome". In that case it can be hard for him/her to figure out what step to use. And vice-versa: if someone wants to say "I want N windows tiled every Y bp across the genome" it can be hard to express this in terms of spacing.So I think we should probably have both
spacing
andstep
, as 2 exclusive args.Thanks for your feedback,
H.