seqselect and window in GRanges

0

Entering edit mode

arne.mueller@novartis.com ▴ 200

@arnemuellernovartiscom-2205

Last seen 9.2 years ago

Switzerland

Dear All, may I ask a basic question about the GRanges package. It seems that the functions seqselect and window treat start/end as indexes in the GRanges object rather that he actually start/end positions. Is there a method with which I can extract a sub-range from an GRanges object based on genomic coordinates rather than indexes? > gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, 200))) > gr GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > > window(gr, start=12, end=98) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. > window(gr, start=1, end=2) GRanges with 2 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [ 10, 20] * | [2] A [100, 200] * | seqlengths A NA > window(gr, start=9, end=40) Error in solveWindowSEW(length(x), start, end, width) : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. ... > sessionInfo() R version 2.13.0 Under development (unstable) (2010-10-31 r53501) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] GenomicRanges_1.1.38 IRanges_1.9.3 loaded via a namespace (and not attached): [1] tools_2.13.0 thanks a lot for your help, Arne [[alternative HTML version deleted]]

• 2.2k views

ADD COMMENT • link updated 14.1 years ago by Martin Morgan 25k • written 14.1 years ago by arne.mueller@novartis.com ▴ 200

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 months ago

United States

On 11/30/2010 05:38 AM, arne.mueller at novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 14.1 years ago Martin Morgan 25k

0

Entering edit mode

On Tue, Nov 30, 2010 at 6:10 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > > Dear All, > > > > may I ask a basic question about the GRanges package. It seems that the > > functions seqselect and window treat start/end as indexes in the GRanges > > object rather that he actually start/end positions. Is there a method > with > > which I can extract a sub-range from an GRanges object based on genomic > > coordinates rather than indexes? > > Hi Arne -- > > it sounds a bit like you want to 1) find overlaping ranges between gr > and genomic location(s) and then 2) restrict (narrow might be > appropriate if looking for, say 5' regions) the ranges to those > locations, along the lines of > > > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > See subsetByOverlaps() for the above; maybe a little cleaner? > > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > > gr1 > GRanges with 1 range and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [12, 18] * | > > seqlengths > A > NA > > To find the common regions from an overlap operation, this is the most general way: overlaps <- findOverlaps(ranges(gr1), subject) ranges(overlaps, ranges(gr1), subject) Not sure if that's what Arne wants though. gr %in% GRanges(<...>) is sugar for match(), which is sugar for > findOverlaps. > > Martin > > > > > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > > 200))) > >> gr > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> > >> window(gr, start=12, end=98) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > >> window(gr, start=1, end=2) > > GRanges with 2 ranges and 0 elementMetadata values > > seqnames ranges strand | > > <rle> <iranges> <rle> | > > [1] A [ 10, 20] * | > > [2] A [100, 200] * | > > > > seqlengths > > A > > NA > >> window(gr, start=9, end=40) > > Error in solveWindowSEW(length(x), start, end, width) : > > Invalid sequence coordinates. > > Please make sure the supplied 'start', 'end' and 'width' arguments > > are defining a region that is within the limits of the sequence. > > ... > > > > > >> sessionInfo() > > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > > > loaded via a namespace (and not attached): > > [1] tools_2.13.0 > > > > thanks a lot for your help, > > > > Arne > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.1 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks a lot for for the replies for finding a subset of features in a GRanges objects - findOverlaps seems to be the way to go for (as I'm not very much into sugar ;-) regards, arne Martin Morgan <mtmorgan@fhcrc.org> 11/30/2010 03:15 PM To arne.mueller@novartis.com cc bioconductor@stat.math.ethz.ch Subject Re: [BioC] seqselect and window in GRanges On 11/30/2010 05:38 AM, arne.mueller@novartis.com wrote: > Dear All, > > may I ask a basic question about the GRanges package. It seems that the > functions seqselect and window treat start/end as indexes in the GRanges > object rather that he actually start/end positions. Is there a method with > which I can extract a sub-range from an GRanges object based on genomic > coordinates rather than indexes? Hi Arne -- it sounds a bit like you want to 1) find overlaping ranges between gr and genomic location(s) and then 2) restrict (narrow might be appropriate if looking for, say 5' regions) the ranges to those locations, along the lines of > gr1 <- gr[gr %in% GRanges("A", IRanges(12, 18))] > ranges(gr1) <- restrict(ranges(gr1), 12, 18) > gr1 GRanges with 1 range and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] A [12, 18] * | seqlengths A NA gr %in% GRanges(<...>) is sugar for match(), which is sugar for findOverlaps. Martin > >> gr = GRanges(seqnames="A", ranges=IRanges(start=c(10, 100), end=c(20, > 200))) >> gr > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> >> window(gr, start=12, end=98) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. >> window(gr, start=1, end=2) > GRanges with 2 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] A [ 10, 20] * | > [2] A [100, 200] * | > > seqlengths > A > NA >> window(gr, start=9, end=40) > Error in solveWindowSEW(length(x), start, end, width) : > Invalid sequence coordinates. > Please make sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the sequence. > ... > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-10-31 r53501) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] GenomicRanges_1.1.38 IRanges_1.9.3 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for your help, > > Arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 [[alternative HTML version deleted]]

ADD REPLY • link 14.1 years ago arne.mueller@novartis.com ▴ 200

Login before adding your answer.