finding neigbouring intervals for 2 GRange objects
1
0
Entering edit mode
@hermann-norpois-5726
Last seen 9.6 years ago
Germany
Hello, I have to grange-objects - gr1 and gr2. And I wish to know: 1) What are the neigbouring intervals of gr1 in gr2 - upstream and downstream? 2) What is the distance in bp between the neighbouring intervals? Could anybody give me a hint how this is done? Thanks Hermann > dput (gr1) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(540823L, 752809L, 771015L, 773361L) , width = c(221L, 230L, 520L, 247L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 4L , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak = c(111L, 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", "epeak", "over")) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) > dput (gr1) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(540823L, 752809L, 771015L, 773361L) , width = c(221L, 230L, 520L, 247L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 4L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 4L , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak = c(111L, 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", "epeak", "over")) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) > gr1 GRanges with 4 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [540823, 541043] * | 589 111 0 [2] chr1 [752809, 753038] * | 574 89 1 [3] chr1 [771015, 771534] * | 578 234 1 [4] chr1 [773361, 773607] * | 571 136 1 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA > gr2 GRanges with 3 ranges and 3 metadata columns: seqnames ranges strand | edensity epeak over <rle> <iranges> <rle> | <integer> <integer> <integer> [1] chr1 [ 53141, 53685] * | 601 212 0 [2] chr1 [521536, 521622] * | 568 37 1 [3] chr1 [805002, 805650] * | 1000 326 1 --- seqlengths: chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX chrY NA NA NA NA NA NA ... NA NA NA NA NA NA [[alternative HTML version deleted]]
• 1.5k views
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 10.1 years ago
United States
You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*": > x=IRanges(2, 2) > x IRanges of length 1 start end width [1] 2 2 1 > y=IRanges(c(1,3), c(1,3)) > y IRanges of length 2 start end width [1] 1 1 1 [2] 3 3 1 > precede(x, y) [1] 2 > follow(x, y) [1] 1 > X = GRanges(ranges=x, seqnames="chr1") > Y = GRanges(ranges=y, seqnames="chr1") > X GRanges with 1 range and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [2, 2] * --- seqlengths: chr1 NA > Y GRanges with 2 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 1] * [2] chr1 [3, 3] * --- seqlengths: chr1 NA > precede(X, Y) [1] 1 > follow(X, Y) [1] 1 > precede(X, Y, ignore.strand=TRUE) [1] 2 > follow(X, Y, ignore.strand=TRUE) [1] 1 On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: > Hello, > > I have to grange-objects - gr1 and gr2. And I wish to know: > 1) What are the neigbouring intervals of gr1 in gr2 - upstream and > downstream? > 2) What is the distance in bp between the neighbouring intervals? > > Could anybody give me a hint how this is done? > Thanks > Hermann > >> dput (gr1) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(540823L, 752809L, 771015L, 773361L) > , width = c(221L, 230L, 520L, 247L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 4L > , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak > = c(111L, > 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", > "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) >> dput (gr1) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(540823L, 752809L, 771015L, 773361L) > , width = c(221L, 230L, 520L, 247L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 4L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 4L > , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak > = c(111L, > 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", > "epeak", "over")) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) > >> gr1 > GRanges with 4 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [540823, 541043] * | 589 111 0 > [2] chr1 [752809, 753038] * | 574 89 1 > [3] chr1 [771015, 771534] * | 578 234 1 > [4] chr1 [773361, 773607] * | 571 136 1 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA >> gr2 > GRanges with 3 ranges and 3 metadata columns: > seqnames ranges strand | edensity epeak over > <rle> <iranges> <rle> | <integer> <integer> <integer> > [1] chr1 [ 53141, 53685] * | 601 212 0 > [2] chr1 [521536, 521622] * | 568 37 1 > [3] chr1 [805002, 805650] * | 1000 326 1 > --- > seqlengths: > chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX > chrY > NA NA NA NA NA NA ... NA NA NA NA NA > NA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Hermann, Harris, I wanted to give some background on the behavior of these functions for GRanges vs IRanges. > gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) > gr GRanges with 3 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [1, 1] * [2] chr1 [2, 2] * [3] chr1 [3, 3] * ----------------------------- "+" strand behavior: ----------------------------- IRanges objects have no strand so you can expect them to behave as "+" strand GRanges do. If this is ever not the case please let me know. That would be a bug. > strand(gr) <- "+" > precede(gr) [1] 2 3 NA > precede(ranges(gr)) [1] 2 3 NA ----------------------------- "-" strand behavior: ----------------------------- I think it makes intuitive sense that precede() and follow() on "-" strand GRanges will give the opposite results from precede() and follow() on "+" strand GRanges. > strand(gr) <- "-" > precede(gr) [1] NA 1 2 ----------------------------- "*" strand behavior: ----------------------------- For "*" strand GRanges maybe we tried to be too clever. When ignore.strand=TRUE strand is treated as "+": > strand(gr) <- "*" > precede(gr, ignore.strand=TRUE) [1] 2 3 NA When ignore.strand=FALSE for a "*" strand range, results are computed for both "+' and "-". If one result is NA, the other is taken. This is how we end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and position 1 was NA for "-"). If there is a tie, the first in order is taken. This is how we get the '1' in position 2 (tie was between '1' and '3' and '1' comes before '3' in order). > precede(gr, ignore.strand=FALSE) [1] 2 1 2 It was a challenge to decide how "*" strand with ignore.strand=FALSE should behave. If others have opinions on this I'd be interested in hearing them. Valerie On 03/03/13 16:38, Harris A. Jaffee wrote: > You want precede() and follow(), and then distance(), but tread somewhat carefully with strands of "*": > >> x=IRanges(2, 2) >> x > IRanges of length 1 > start end width > [1] 2 2 1 > >> y=IRanges(c(1,3), c(1,3)) >> y > IRanges of length 2 > start end width > [1] 1 1 1 > [2] 3 3 1 > >> precede(x, y) > [1] 2 >> follow(x, y) > [1] 1 > >> X = GRanges(ranges=x, seqnames="chr1") >> Y = GRanges(ranges=y, seqnames="chr1") >> X > GRanges with 1 range and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [2, 2] * > --- > seqlengths: > chr1 > NA >> Y > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 1] * > [2] chr1 [3, 3] * > --- > seqlengths: > chr1 > NA > >> precede(X, Y) > [1] 1 >> follow(X, Y) > [1] 1 > >> precede(X, Y, ignore.strand=TRUE) > [1] 2 >> follow(X, Y, ignore.strand=TRUE) > [1] 1 > > > On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: > >> Hello, >> >> I have to grange-objects - gr1 and gr2. And I wish to know: >> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >> downstream? >> 2) What is the distance in bp between the neighbouring intervals? >> >> Could anybody give me a hint how this is done? >> Thanks >> Hermann >> >>> dput (gr1) >> new("GRanges" >> , seqnames = new("Rle" >> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", >> "chr13", >> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >> "chr8", "chr9", "chrX", "chrY"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , ranges = new("IRanges" >> , start = c(540823L, 752809L, 771015L, 773361L) >> , width = c(221L, 230L, 520L, 247L) >> , NAMES = NULL >> , elementType = "integer" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , strand = new("Rle" >> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , elementMetadata = new("DataFrame" >> , rownames = NULL >> , nrows = 4L >> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak >> = c(111L, >> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >> "epeak", "over")) >> , elementType = "ANY" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , seqinfo = new("Seqinfo" >> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >> "chr15", >> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >> "chrX", "chrY") >> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, >> NA, NA, NA, NA, NA, NA, NA, NA, NA) >> , genome = c(NA_character_, NA_character_, NA_character_, >> NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >> ) >> ) >> , metadata = list() >> ) >>> dput (gr1) >> new("GRanges" >> , seqnames = new("Rle" >> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", >> "chr13", >> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >> "chr8", "chr9", "chrX", "chrY"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , ranges = new("IRanges" >> , start = c(540823L, 752809L, 771015L, 773361L) >> , width = c(221L, 230L, 520L, 247L) >> , NAMES = NULL >> , elementType = "integer" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , strand = new("Rle" >> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >> , lengths = 4L >> , elementMetadata = NULL >> , metadata = list() >> ) >> , elementMetadata = new("DataFrame" >> , rownames = NULL >> , nrows = 4L >> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), epeak >> = c(111L, >> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >> "epeak", "over")) >> , elementType = "ANY" >> , elementMetadata = NULL >> , metadata = list() >> ) >> , seqinfo = new("Seqinfo" >> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >> "chr15", >> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >> "chrX", "chrY") >> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> NA, NA, >> NA, NA, NA, NA, NA, NA, NA, NA, NA) >> , genome = c(NA_character_, NA_character_, NA_character_, >> NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, >> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >> ) >> ) >> , metadata = list() >> ) >> >>> gr1 >> GRanges with 4 ranges and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [540823, 541043] * | 589 111 0 >> [2] chr1 [752809, 753038] * | 574 89 1 >> [3] chr1 [771015, 771534] * | 578 234 1 >> [4] chr1 [773361, 773607] * | 571 136 1 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >>> gr2 >> GRanges with 3 ranges and 3 metadata columns: >> seqnames ranges strand | edensity epeak over >> <rle> <iranges> <rle> | <integer> <integer> <integer> >> [1] chr1 [ 53141, 53685] * | 601 212 0 >> [2] chr1 [521536, 521622] * | 568 37 1 >> [3] chr1 [805002, 805650] * | 1000 326 1 >> --- >> seqlengths: >> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >> chrY >> NA NA NA NA NA NA ... NA NA NA NA NA >> NA >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.info rmatics.conductor
ADD REPLY
0
Entering edit mode
On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > Hi Hermann, Harris, > > I wanted to give some background on the behavior of these functions for > GRanges vs IRanges. > > > gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) > > gr > > GRanges with 3 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [1, 1] * > [2] chr1 [2, 2] * > [3] chr1 [3, 3] * > > ----------------------------- > "+" strand behavior: > ----------------------------- > IRanges objects have no strand so you can expect them to behave as "+" > strand GRanges do. If this is ever not the case please let me know. That > would be a bug. > > > strand(gr) <- "+" > > precede(gr) > [1] 2 3 NA > > precede(ranges(gr)) > [1] 2 3 NA > > ----------------------------- > "-" strand behavior: > ----------------------------- > I think it makes intuitive sense that precede() and follow() on "-" strand > GRanges will give the opposite results from precede() and follow() on "+" > strand GRanges. > > > strand(gr) <- "-" > > precede(gr) > [1] NA 1 2 > > ----------------------------- > "*" strand behavior: > ----------------------------- > For "*" strand GRanges maybe we tried to be too clever. When > ignore.strand=TRUE strand is treated as "+": > > strand(gr) <- "*" > > precede(gr, ignore.strand=TRUE) > [1] 2 3 NA > > When ignore.strand=FALSE for a "*" strand range, results are computed for > both "+' and "-". If one result is NA, the other is taken. This is how we > end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and > position 1 was NA for "-"). If there is a tie, the first in order is taken. > This is how we get the '1' in position 2 (tie was between '1' and '3' and > '1' comes before '3' in order). > > precede(gr, ignore.strand=FALSE) > [1] 2 1 2 > > It was a challenge to decide how "*" strand with ignore.strand=FALSE > should behave. If others have opinions on this I'd be interested in hearing > them. > > So it seems that "*" in GenomicRanges can mean at least two things: "either" or "missing". I think at one point these were represented by different values, but now the meanings have been merged for simplicity. Instead, we need to use an extra parameter, in this case ignore.strand, to distinguish the two meanings during an operation. It's often convenient to be able to change the meaning on a per-operation basis, but there is also a desire to keep the meaning in the data structure. Anyway, all this has been decided long ago and it probably does not make sense to change anything now. There is some concern though about the usability of the API. Usually, a "*" is used to select the natural/IRanges behavior, but the default of ignore.strand is FALSE, so the "either" meaning is assumed. This is going to surprise users. The same is true of ignore.strand in findOverlaps: many users are surprised to find that by default 'strand' separates the ranges into two coordinate systems, instead of just indicating direction. Michael > Valerie > > > > > > > On 03/03/13 16:38, Harris A. Jaffee wrote: > >> You want precede() and follow(), and then distance(), but tread somewhat >> carefully with strands of "*": >> >> x=IRanges(2, 2) >>> x >>> >> IRanges of length 1 >> start end width >> [1] 2 2 1 >> >> y=IRanges(c(1,3), c(1,3)) >>> y >>> >> IRanges of length 2 >> start end width >> [1] 1 1 1 >> [2] 3 3 1 >> >> precede(x, y) >>> >> [1] 2 >> >>> follow(x, y) >>> >> [1] 1 >> >> X = GRanges(ranges=x, seqnames="chr1") >>> Y = GRanges(ranges=y, seqnames="chr1") >>> X >>> >> GRanges with 1 range and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [2, 2] * >> --- >> seqlengths: >> chr1 >> NA >> >>> Y >>> >> GRanges with 2 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [1, 1] * >> [2] chr1 [3, 3] * >> --- >> seqlengths: >> chr1 >> NA >> >> precede(X, Y) >>> >> [1] 1 >> >>> follow(X, Y) >>> >> [1] 1 >> >> precede(X, Y, ignore.strand=TRUE) >>> >> [1] 2 >> >>> follow(X, Y, ignore.strand=TRUE) >>> >> [1] 1 >> >> >> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >> >> Hello, >>> >>> I have to grange-objects - gr1 and gr2. And I wish to know: >>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>> downstream? >>> 2) What is the distance in bp between the neighbouring intervals? >>> >>> Could anybody give me a hint how this is done? >>> Thanks >>> Hermann >>> >>> dput (gr1) >>>> >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>> "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(540823L, 752809L, 771015L, 773361L) >>> , width = c(221L, 230L, 520L, 247L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 4L >>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>> epeak >>> = c(111L, >>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>> "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>>> dput (gr1) >>>> >>> new("GRanges" >>> , seqnames = new("Rle" >>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>> "chr12", >>> "chr13", >>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , ranges = new("IRanges" >>> , start = c(540823L, 752809L, 771015L, 773361L) >>> , width = c(221L, 230L, 520L, 247L) >>> , NAMES = NULL >>> , elementType = "integer" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , strand = new("Rle" >>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>> , lengths = 4L >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , elementMetadata = new("DataFrame" >>> , rownames = NULL >>> , nrows = 4L >>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>> epeak >>> = c(111L, >>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>> "epeak", "over")) >>> , elementType = "ANY" >>> , elementMetadata = NULL >>> , metadata = list() >>> ) >>> , seqinfo = new("Seqinfo" >>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>> "chr15", >>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>> "chrX", "chrY") >>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> NA, NA, >>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>> , genome = c(NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, >>> NA_character_, >>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>> ) >>> ) >>> , metadata = list() >>> ) >>> >>> gr1 >>>> >>> GRanges with 4 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [540823, 541043] * | 589 111 0 >>> [2] chr1 [752809, 753038] * | 574 89 1 >>> [3] chr1 [771015, 771534] * | 578 234 1 >>> [4] chr1 [773361, 773607] * | 571 136 1 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> >>>> gr2 >>>> >>> GRanges with 3 ranges and 3 metadata columns: >>> seqnames ranges strand | edensity epeak over >>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>> [2] chr1 [521536, 521622] * | 568 37 1 >>> [3] chr1 [805002, 805650] * | 1000 326 1 >>> --- >>> seqlengths: >>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>> chrY >>> NA NA NA NA NA NA ... NA NA NA NA NA >>> NA >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>> informatics.conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor=""> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives:http://news.gmane.**org/gmane.science.biology.** >> informatics.conductor<http: news.gmane.org="" gmane.science.biology.i="" nformatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, FWIW I also find it confusing that (d) gives something different: subject <- GRanges("chr1", IRanges(1:3, width=1)) x <- subject[2] (a): + / + strand(x) <- strand(subject) <- "+" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (b): + / * strand(x) <- "+" strand(subject) <- "*" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (c): * / + strand(x) <- "*" strand(subject) <- "+" > precede(x, subject) [1] 3 > follow(x, subject) [1] 1 (d): * / * strand(x) <- strand(subject) <- "*" > precede(x, subject) [1] 1 > follow(x, subject) [1] 1 What could be the motivation behind this? Wouldn't things be more consistent and less confusing if (a), (b), (c), and (d) all behaved consistently with (e): > precede(ranges(x), ranges(subject)) [1] 3 > follow(ranges(x), ranges(subject)) [1] 1 Thanks, H. On 03/04/2013 09:57 AM, Michael Lawrence wrote: > On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain <vobencha at="" fhcrc.org="">wrote: > >> Hi Hermann, Harris, >> >> I wanted to give some background on the behavior of these functions for >> GRanges vs IRanges. >> >>> gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) >>> gr >> >> GRanges with 3 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [1, 1] * >> [2] chr1 [2, 2] * >> [3] chr1 [3, 3] * >> >> ----------------------------- >> "+" strand behavior: >> ----------------------------- >> IRanges objects have no strand so you can expect them to behave as "+" >> strand GRanges do. If this is ever not the case please let me know. That >> would be a bug. >> >>> strand(gr) <- "+" >>> precede(gr) >> [1] 2 3 NA >>> precede(ranges(gr)) >> [1] 2 3 NA >> >> ----------------------------- >> "-" strand behavior: >> ----------------------------- >> I think it makes intuitive sense that precede() and follow() on "-" strand >> GRanges will give the opposite results from precede() and follow() on "+" >> strand GRanges. >> >>> strand(gr) <- "-" >>> precede(gr) >> [1] NA 1 2 >> >> ----------------------------- >> "*" strand behavior: >> ----------------------------- >> For "*" strand GRanges maybe we tried to be too clever. When >> ignore.strand=TRUE strand is treated as "+": >>> strand(gr) <- "*" >>> precede(gr, ignore.strand=TRUE) >> [1] 2 3 NA >> >> When ignore.strand=FALSE for a "*" strand range, results are computed for >> both "+' and "-". If one result is NA, the other is taken. This is how we >> end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and >> position 1 was NA for "-"). If there is a tie, the first in order is taken. >> This is how we get the '1' in position 2 (tie was between '1' and '3' and >> '1' comes before '3' in order). >>> precede(gr, ignore.strand=FALSE) >> [1] 2 1 2 >> >> It was a challenge to decide how "*" strand with ignore.strand=FALSE >> should behave. If others have opinions on this I'd be interested in hearing >> them. >> >> > > So it seems that "*" in GenomicRanges can mean at least two things: > "either" or "missing". I think at one point these were represented by > different values, but now the meanings have been merged for simplicity. > Instead, we need to use an extra parameter, in this case ignore.strand, to > distinguish the two meanings during an operation. It's often convenient to > be able to change the meaning on a per-operation basis, but there is also a > desire to keep the meaning in the data structure. Anyway, all this has been > decided long ago and it probably does not make sense to change anything now. > > There is some concern though about the usability of the API. Usually, a "*" > is used to select the natural/IRanges behavior, but the default of > ignore.strand is FALSE, so the "either" meaning is assumed. This is going > to surprise users. The same is true of ignore.strand in findOverlaps: many > users are surprised to find that by default 'strand' separates the ranges > into two coordinate systems, instead of just indicating direction. > > Michael > > >> Valerie >> >> >> >> >> >> >> On 03/03/13 16:38, Harris A. Jaffee wrote: >> >>> You want precede() and follow(), and then distance(), but tread somewhat >>> carefully with strands of "*": >>> >>> x=IRanges(2, 2) >>>> x >>>> >>> IRanges of length 1 >>> start end width >>> [1] 2 2 1 >>> >>> y=IRanges(c(1,3), c(1,3)) >>>> y >>>> >>> IRanges of length 2 >>> start end width >>> [1] 1 1 1 >>> [2] 3 3 1 >>> >>> precede(x, y) >>>> >>> [1] 2 >>> >>>> follow(x, y) >>>> >>> [1] 1 >>> >>> X = GRanges(ranges=x, seqnames="chr1") >>>> Y = GRanges(ranges=y, seqnames="chr1") >>>> X >>>> >>> GRanges with 1 range and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [2, 2] * >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>>> Y >>>> >>> GRanges with 2 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [1, 1] * >>> [2] chr1 [3, 3] * >>> --- >>> seqlengths: >>> chr1 >>> NA >>> >>> precede(X, Y) >>>> >>> [1] 1 >>> >>>> follow(X, Y) >>>> >>> [1] 1 >>> >>> precede(X, Y, ignore.strand=TRUE) >>>> >>> [1] 2 >>> >>>> follow(X, Y, ignore.strand=TRUE) >>>> >>> [1] 1 >>> >>> >>> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >>> >>> Hello, >>>> >>>> I have to grange-objects - gr1 and gr2. And I wish to know: >>>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>>> downstream? >>>> 2) What is the distance in bp between the neighbouring intervals? >>>> >>>> Could anybody give me a hint how this is done? >>>> Thanks >>>> Hermann >>>> >>>> dput (gr1) >>>>> >>>> new("GRanges" >>>> , seqnames = new("Rle" >>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>> "chr12", >>>> "chr13", >>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , ranges = new("IRanges" >>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>> , width = c(221L, 230L, 520L, 247L) >>>> , NAMES = NULL >>>> , elementType = "integer" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , strand = new("Rle" >>>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , elementMetadata = new("DataFrame" >>>> , rownames = NULL >>>> , nrows = 4L >>>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>>> epeak >>>> = c(111L, >>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>> "epeak", "over")) >>>> , elementType = "ANY" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , seqinfo = new("Seqinfo" >>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>>> "chr15", >>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>> "chrX", "chrY") >>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>> NA, NA, >>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>>> ) >>>> ) >>>> , metadata = list() >>>> ) >>>> >>>>> dput (gr1) >>>>> >>>> new("GRanges" >>>> , seqnames = new("Rle" >>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>> "chr12", >>>> "chr13", >>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , ranges = new("IRanges" >>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>> , width = c(221L, 230L, 520L, 247L) >>>> , NAMES = NULL >>>> , elementType = "integer" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , strand = new("Rle" >>>> , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") >>>> , lengths = 4L >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , elementMetadata = new("DataFrame" >>>> , rownames = NULL >>>> , nrows = 4L >>>> , listData = structure(list(edensity = c(589L, 574L, 578L, 571L), >>>> epeak >>>> = c(111L, >>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>> "epeak", "over")) >>>> , elementType = "ANY" >>>> , elementMetadata = NULL >>>> , metadata = list() >>>> ) >>>> , seqinfo = new("Seqinfo" >>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", >>>> "chr15", >>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>> "chrX", "chrY") >>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>> NA, NA, >>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>> NA_character_, >>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ >>>> ) >>>> ) >>>> , metadata = list() >>>> ) >>>> >>>> gr1 >>>>> >>>> GRanges with 4 ranges and 3 metadata columns: >>>> seqnames ranges strand | edensity epeak over >>>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>>> [1] chr1 [540823, 541043] * | 589 111 0 >>>> [2] chr1 [752809, 753038] * | 574 89 1 >>>> [3] chr1 [771015, 771534] * | 578 234 1 >>>> [4] chr1 [773361, 773607] * | 571 136 1 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA ... NA NA NA NA NA >>>> NA >>>> >>>>> gr2 >>>>> >>>> GRanges with 3 ranges and 3 metadata columns: >>>> seqnames ranges strand | edensity epeak over >>>> <rle> <iranges> <rle> | <integer> <integer> <integer> >>>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>>> [2] chr1 [521536, 521622] * | 568 37 1 >>>> [3] chr1 [805002, 805650] * | 1000 326 1 >>>> --- >>>> seqlengths: >>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 chr9 chrX >>>> chrY >>>> NA NA NA NA NA NA ... NA NA NA NA NA >>>> NA >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>> informatics.conductor<http: news.gmane.org="" gmane.science.biology="" .informatics.conductor=""> >>>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>> informatics.conductor<http: news.gmane.org="" gmane.science.biology.="" informatics.conductor=""> >>> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Herve, On 03/04/13 15:50, Hervé Pagès wrote: > Hi, > > FWIW I also find it confusing that (d) gives something different: > > subject <- GRanges("chr1", IRanges(1:3, width=1)) > x <- subject[2] > > (a): + / + > > strand(x) <- strand(subject) <- "+" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > > (b): + / * > > strand(x) <- "+" > strand(subject) <- "*" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > (c): * / + > > strand(x) <- "*" > strand(subject) <- "+" > > > precede(x, subject) > [1] 3 > > follow(x, subject) > [1] 1 > > (d): * / * > > strand(x) <- strand(subject) <- "*" > > > precede(x, subject) > [1] 1 > > follow(x, subject) > [1] 1 > > What could be the motivation behind this? Wouldn't things be more > consistent and less confusing if (a), (b), (c), and (d) all behaved > consistently with (e): This is the "maybe we tried to be too clever" part. (d) is consistent with (e) when ignore.strand=TRUE. By default ignore.strand=FALSE and these methods look for solutions on both/either strand. Val > > > precede(ranges(x), ranges(subject)) > [1] 3 > > follow(ranges(x), ranges(subject)) > [1] 1 > > Thanks, > H. > > > On 03/04/2013 09:57 AM, Michael Lawrence wrote: >> On Mon, Mar 4, 2013 at 9:17 AM, Valerie Obenchain >> <vobencha at="" fhcrc.org="">wrote: >> >>> Hi Hermann, Harris, >>> >>> I wanted to give some background on the behavior of these functions for >>> GRanges vs IRanges. >>> >>>> gr <- GRanges("chr1", IRanges(c(1,2,3), width=1)) >>>> gr >>> >>> GRanges with 3 ranges and 0 metadata columns: >>> seqnames ranges strand >>> <rle> <iranges> <rle> >>> [1] chr1 [1, 1] * >>> [2] chr1 [2, 2] * >>> [3] chr1 [3, 3] * >>> >>> ----------------------------- >>> "+" strand behavior: >>> ----------------------------- >>> IRanges objects have no strand so you can expect them to behave as "+" >>> strand GRanges do. If this is ever not the case please let me know. >>> That >>> would be a bug. >>> >>>> strand(gr) <- "+" >>>> precede(gr) >>> [1] 2 3 NA >>>> precede(ranges(gr)) >>> [1] 2 3 NA >>> >>> ----------------------------- >>> "-" strand behavior: >>> ----------------------------- >>> I think it makes intuitive sense that precede() and follow() on "-" >>> strand >>> GRanges will give the opposite results from precede() and follow() >>> on "+" >>> strand GRanges. >>> >>>> strand(gr) <- "-" >>>> precede(gr) >>> [1] NA 1 2 >>> >>> ----------------------------- >>> "*" strand behavior: >>> ----------------------------- >>> For "*" strand GRanges maybe we tried to be too clever. When >>> ignore.strand=TRUE strand is treated as "+": >>>> strand(gr) <- "*" >>>> precede(gr, ignore.strand=TRUE) >>> [1] 2 3 NA >>> >>> When ignore.strand=FALSE for a "*" strand range, results are >>> computed for >>> both "+' and "-". If one result is NA, the other is taken. This is >>> how we >>> end up with a '2' in positions 1 and 3 (position 3 was NA for "+" and >>> position 1 was NA for "-"). If there is a tie, the first in order is >>> taken. >>> This is how we get the '1' in position 2 (tie was between '1' and >>> '3' and >>> '1' comes before '3' in order). >>>> precede(gr, ignore.strand=FALSE) >>> [1] 2 1 2 >>> >>> It was a challenge to decide how "*" strand with ignore.strand=FALSE >>> should behave. If others have opinions on this I'd be interested in >>> hearing >>> them. >>> >>> >> >> So it seems that "*" in GenomicRanges can mean at least two things: >> "either" or "missing". I think at one point these were represented by >> different values, but now the meanings have been merged for simplicity. >> Instead, we need to use an extra parameter, in this case >> ignore.strand, to >> distinguish the two meanings during an operation. It's often >> convenient to >> be able to change the meaning on a per-operation basis, but there is >> also a >> desire to keep the meaning in the data structure. Anyway, all this >> has been >> decided long ago and it probably does not make sense to change >> anything now. >> >> There is some concern though about the usability of the API. Usually, >> a "*" >> is used to select the natural/IRanges behavior, but the default of >> ignore.strand is FALSE, so the "either" meaning is assumed. This is >> going >> to surprise users. The same is true of ignore.strand in findOverlaps: >> many >> users are surprised to find that by default 'strand' separates the >> ranges >> into two coordinate systems, instead of just indicating direction. >> >> Michael >> >> >>> Valerie >>> >>> >>> >>> >>> >>> >>> On 03/03/13 16:38, Harris A. Jaffee wrote: >>> >>>> You want precede() and follow(), and then distance(), but tread >>>> somewhat >>>> carefully with strands of "*": >>>> >>>> x=IRanges(2, 2) >>>>> x >>>>> >>>> IRanges of length 1 >>>> start end width >>>> [1] 2 2 1 >>>> >>>> y=IRanges(c(1,3), c(1,3)) >>>>> y >>>>> >>>> IRanges of length 2 >>>> start end width >>>> [1] 1 1 1 >>>> [2] 3 3 1 >>>> >>>> precede(x, y) >>>>> >>>> [1] 2 >>>> >>>>> follow(x, y) >>>>> >>>> [1] 1 >>>> >>>> X = GRanges(ranges=x, seqnames="chr1") >>>>> Y = GRanges(ranges=y, seqnames="chr1") >>>>> X >>>>> >>>> GRanges with 1 range and 0 metadata columns: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [2, 2] * >>>> --- >>>> seqlengths: >>>> chr1 >>>> NA >>>> >>>>> Y >>>>> >>>> GRanges with 2 ranges and 0 metadata columns: >>>> seqnames ranges strand >>>> <rle> <iranges> <rle> >>>> [1] chr1 [1, 1] * >>>> [2] chr1 [3, 3] * >>>> --- >>>> seqlengths: >>>> chr1 >>>> NA >>>> >>>> precede(X, Y) >>>>> >>>> [1] 1 >>>> >>>>> follow(X, Y) >>>>> >>>> [1] 1 >>>> >>>> precede(X, Y, ignore.strand=TRUE) >>>>> >>>> [1] 2 >>>> >>>>> follow(X, Y, ignore.strand=TRUE) >>>>> >>>> [1] 1 >>>> >>>> >>>> On Mar 3, 2013, at 4:10 PM, Hermann Norpois wrote: >>>> >>>> Hello, >>>>> >>>>> I have to grange-objects - gr1 and gr2. And I wish to know: >>>>> 1) What are the neigbouring intervals of gr1 in gr2 - upstream and >>>>> downstream? >>>>> 2) What is the distance in bp between the neighbouring intervals? >>>>> >>>>> Could anybody give me a hint how this is done? >>>>> Thanks >>>>> Hermann >>>>> >>>>> dput (gr1) >>>>>> >>>>> new("GRanges" >>>>> , seqnames = new("Rle" >>>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>>> "chr12", >>>>> "chr13", >>>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , ranges = new("IRanges" >>>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>>> , width = c(221L, 230L, 520L, 247L) >>>>> , NAMES = NULL >>>>> , elementType = "integer" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , strand = new("Rle" >>>>> , values = structure(3L, .Label = c("+", "-", "*"), class = >>>>> "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , elementMetadata = new("DataFrame" >>>>> , rownames = NULL >>>>> , nrows = 4L >>>>> , listData = structure(list(edensity = c(589L, 574L, 578L, >>>>> 571L), >>>>> epeak >>>>> = c(111L, >>>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>>> "epeak", "over")) >>>>> , elementType = "ANY" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , seqinfo = new("Seqinfo" >>>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", >>>>> "chr14", >>>>> "chr15", >>>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>>> "chrX", "chrY") >>>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA, NA, >>>>> NA, NA, >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_ >>>>> ) >>>>> ) >>>>> , metadata = list() >>>>> ) >>>>> >>>>>> dput (gr1) >>>>>> >>>>> new("GRanges" >>>>> , seqnames = new("Rle" >>>>> , values = structure(1L, .Label = c("chr1", "chr10", "chr11", >>>>> "chr12", >>>>> "chr13", >>>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", >>>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", >>>>> "chr8", "chr9", "chrX", "chrY"), class = "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , ranges = new("IRanges" >>>>> , start = c(540823L, 752809L, 771015L, 773361L) >>>>> , width = c(221L, 230L, 520L, 247L) >>>>> , NAMES = NULL >>>>> , elementType = "integer" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , strand = new("Rle" >>>>> , values = structure(3L, .Label = c("+", "-", "*"), class = >>>>> "factor") >>>>> , lengths = 4L >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , elementMetadata = new("DataFrame" >>>>> , rownames = NULL >>>>> , nrows = 4L >>>>> , listData = structure(list(edensity = c(589L, 574L, 578L, >>>>> 571L), >>>>> epeak >>>>> = c(111L, >>>>> 89L, 234L, 136L), over = c(0L, 1L, 1L, 1L)), .Names = c("edensity", >>>>> "epeak", "over")) >>>>> , elementType = "ANY" >>>>> , elementMetadata = NULL >>>>> , metadata = list() >>>>> ) >>>>> , seqinfo = new("Seqinfo" >>>>> , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", >>>>> "chr14", >>>>> "chr15", >>>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", >>>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", >>>>> "chrX", "chrY") >>>>> , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, >>>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_) >>>>> , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>>>> NA, NA, >>>>> NA, NA, >>>>> NA, NA, NA, NA, NA, NA, NA, NA, NA) >>>>> , genome = c(NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_, >>>>> NA_character_, NA_character_, NA_character_, NA_character_, >>>>> NA_character_ >>>>> ) >>>>> ) >>>>> , metadata = list() >>>>> ) >>>>> >>>>> gr1 >>>>>> >>>>> GRanges with 4 ranges and 3 metadata columns: >>>>> seqnames ranges strand | edensity epeak over >>>>> <rle> <iranges> <rle> | <integer> <integer> >>>>> <integer> >>>>> [1] chr1 [540823, 541043] * | 589 111 0 >>>>> [2] chr1 [752809, 753038] * | 574 89 1 >>>>> [3] chr1 [771015, 771534] * | 578 234 1 >>>>> [4] chr1 [773361, 773607] * | 571 136 1 >>>>> --- >>>>> seqlengths: >>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 >>>>> chr9 chrX >>>>> chrY >>>>> NA NA NA NA NA NA ... NA NA NA >>>>> NA NA >>>>> NA >>>>> >>>>>> gr2 >>>>>> >>>>> GRanges with 3 ranges and 3 metadata columns: >>>>> seqnames ranges strand | edensity epeak over >>>>> <rle> <iranges> <rle> | <integer> <integer> >>>>> <integer> >>>>> [1] chr1 [ 53141, 53685] * | 601 212 0 >>>>> [2] chr1 [521536, 521622] * | 568 37 1 >>>>> [3] chr1 [805002, 805650] * | 1000 326 1 >>>>> --- >>>>> seqlengths: >>>>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr6 chr7 chr8 >>>>> chr9 chrX >>>>> chrY >>>>> NA NA NA NA NA NA ... NA NA NA >>>>> NA NA >>>>> NA >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> >>>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>>> informatics.conductor<http: news.gmane.org="" gmane.science.biolog="" y.informatics.conductor=""> >>>>> >>>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> >>>> Search the archives:http://news.gmane.**org/gmane.science.biology.** >>>> informatics.conductor<http: news.gmane.org="" gmane.science.biology="" .informatics.conductor=""> >>>> >>>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY

Login before adding your answer.

Traffic: 560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6