selecting score(GRanges_object) based on coordinates
1
0
Entering edit mode
@philip-lijnzaad-2499
Last seen 2.4 years ago
European Union
Dear all, I can't seem to select a run of scores from GRanges object, using the coordinates from another GRanges object. An example: coverage <- GRanges(seqnames="chrI", ranges=IRanges(start=c(1,11,21), width=10), score=c(0, 42, 0) ) feature <- GRanges(seqnames="chrI", ranges=IRanges(start=5, end=25)) In ascii this looks like: coverage: ...__________----------__________... feature: .......xxxxxxxxxxxxxxxxxxxxx........ wanted: _____----------______ I tried using score(subsetByOverlaps(coverage, feature)) which I hope would give me back 0 0 0 0 0 42 42 42 42 42 42 42 42 42 42 0 0 0 0 0 or maybe numeric-Rle of length 20 with 3 runs Lengths: 5 10 5 Values : 0 42 0 But instead I get back 0 42 0, i.e. the score of the full original coverage object. However, the widths are wrong. What function(s) should I use to achieve this? I would expect this to be a trivial operation, but could not find any example of this (e.g. it would be really nice if I could just write coverage[feature]). The problem seems to be that the 'score' metadata does not 'know' the width of the IRange they are located on (this was read with rtracklayer::import.bw). Any help appreciated. Philip -- Philip Lijnzaad, PhD Molecular Cancer Research University Medical Center (UMC), Utrecht Stratenum room 2.211 IM: plijnzaad at jabber.org , philip.lijnzaad at gmail.com P.O. Box 85060, 3508 AB Utrecht (Universiteitsweg 100, 3584 CG Utrecht) The Netherlands tel: +31 (0)8875 68348 fax: +31 (0)8875 68479 ---------------------------------------------------------------------- -------- De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. Denk s.v.p aan het milieu voor u deze e-mail afdrukt. ---------------------------------------------------------------------- -------- This message may contain confidential information and is...{{dropped:8}}
Cancer Cancer • 1.1k views
ADD COMMENT
0
Entering edit mode
@philip-lijnzaad-2499
Last seen 2.4 years ago
European Union
On 2014-02-03 13:07, Philip Lijnzaad wrote: > Dear all, > > I can't seem to select a run of scores from GRanges object, using the > coordinates from another GRanges object. > > An example: > > coverage <- GRanges(seqnames="chrI", > > ranges=IRanges(start=c(1,11,21), width=10), > > score=c(0, 42, 0) > > ) > > feature <- GRanges(seqnames="chrI", ranges=IRanges(start=5, end=25)) > > In ascii this looks like: > > coverage: ...__________----------__________... > > feature: .......xxxxxxxxxxxxxxxxxxxxx........ > > wanted: _____----------______ > > I tried using > > score(subsetByOverlaps(coverage, feature)) > > which I hope would give me back > > 0 0 0 0 0 42 42 42 42 42 42 42 42 42 42 0 0 0 0 0 > > or maybe > > numeric-Rle of length 20 with 3 runs > > Lengths: 5 10 5 > > Values : 0 42 0 Mmm ... seems I stumbled across a solution myself. If I do scores <- Rle(score(coverage), width(ranges(coverage))) I can do scores[ ranges(feature) ] which seems to do exactly what I need. Is this the right (and fastest) way? If so, might be nice to have this in one of the vignettes. Cheers, Ph ---------------------------------------------------------------------- -------- De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. Denk s.v.p aan het milieu voor u deze e-mail afdrukt. ---------------------------------------------------------------------- -------- This message may contain confidential information and is...{{dropped:8}}
ADD COMMENT
0
Entering edit mode
On Mon, Feb 3, 2014 at 4:38 AM, Philip Lijnzaad <p.lijnzaad@umcutrecht.nl>wrote: > On 2014-02-03 13:07, Philip Lijnzaad wrote: > >> Dear all, >> >> I can't seem to select a run of scores from GRanges object, using the >> coordinates from another GRanges object. >> >> An example: >> >> coverage <- GRanges(seqnames="chrI", >> >> ranges=IRanges(start=c(1,11,21), width=10), >> >> score=c(0, 42, 0) >> >> ) >> >> feature <- GRanges(seqnames="chrI", ranges=IRanges(start=5, end=25)) >> >> In ascii this looks like: >> >> coverage: ...__________----------__________... >> >> feature: .......xxxxxxxxxxxxxxxxxxxxx........ >> >> wanted: _____----------______ >> >> I tried using >> >> score(subsetByOverlaps(coverage, feature)) >> >> which I hope would give me back >> >> 0 0 0 0 0 42 42 42 42 42 42 42 42 42 42 0 0 0 0 0 >> >> or maybe >> >> numeric-Rle of length 20 with 3 runs >> >> Lengths: 5 10 5 >> >> Values : 0 42 0 >> > > Mmm ... seems I stumbled across a solution myself. If I do > > scores <- Rle(score(coverage), width(ranges(coverage))) > > > I can do > > scores[ ranges(feature) ] > > > which seems to do exactly what I need. Is this the right (and fastest) > way? If so, might be nice to have this > in one of the vignettes. > > Note that this solution assumes that the data are on one chromosome. In general, GRanges maps to RleList, so instead of: scores <- Rle(score(coverage), width(coverage)) scores[ ranges(feature) ] You want: scores <- coverage(coverage, weight="score") scores[as(feature, "RangesList")] Make sure that the seqlevels are the same between "coverage" and "feature". However, in this case fastest way is to have rtracklayer load the coverage directly as an Rle. In release, this is just scores <- import("my.bw", asRle=TRUE) scores[ as(feature, "RangesList") ] In devel, the first line needs to be scores <- import("my.bw", as="RleList") and there is also support for as="NumericList" thanks to Val. If you are *only* interested in the data over "feature", then you could tell rtracklayer to only load that data (the rest of the RleList will be zeroed out). scores <- import("my.bw", asRle=TRUE, which=feature) feature.scores <- scores[ as(feature, "RangesList") ] If "feature" is contained within one chromosome, then you might want to simply: unlist(feature.scores, use.names=FALSE) With all that explained, it would be nice to know your use case. Often these types of problems are solvable with RleViews[List]. Michael Cheers, > > Ph > > > ------------------------------------------------------------ > ------------------ > > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender > direct > te informeren door het bericht te retourneren. Het Universitair Medisch > Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de > W.H.W. > (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd > bij > de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. > > Denk s.v.p aan het milieu voor u deze e-mail afdrukt. > > ------------------------------------------------------------ > ------------------ > > This message may contain confidential information and ...{{dropped:13}}
ADD REPLY
0
Entering edit mode
Hi Michael, thanks for the elaborate anwer. Note that this solution assumes that the data are on one chromosome. hmm, it seems to work with the coverage being for all chromomes; I iterate over the features (each of them contiguous and on one chromosome only). You want: scores <- coverage(coverage, weight="score") OK, but the coverage was read from a bigwig file, so in this case it would not work I think. scores[as(feature, "RangesList")] Make sure that the seqlevels are the same between "coverage" and "feature". OK, good point. However, in this case fastest way is to have rtracklayer load the coverage directly as an Rle. In release, this is just scores <- import("[1]my.bw", asRle=TRUE) scores[ as(feature, "RangesList") ] I was actually been looking for something that, but this is not available my version (R 3.0.1, Bioconductor 2.12, rtracklayer 1.20.4) If you are *only* interested in the data over "feature", then you could tell rtracklayer to only load that data (the rest of the RleList will be zeroed out). scores <- import("[2]my.bw", asRle=TRUE, which=feature) feature.scores <- scores[ as(feature, "RangesList") ] If "feature" is contained within one chromosome, then you might want to simply: unlist(feature.scores, use.names=FALSE) OK With all that explained, it would be nice to know your use case. Often these types of problems are solvable with RleViews[List]. Well, I was interested in the behaviour of protein-DNA binding data (for which I have external bigwig files) over the promoters of a certain set of genes. I would have hoped there to be a clear-cut example on the interwebs. I guess there is one now (i.e. this thread :-), but still it would be nice to be in one of the vignettes. Thanks again, Philip _________________________________________________________________ De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Het Universitair Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197. Denk s.v.p aan het milieu voor u deze e-mail afdrukt. _________________________________________________________________ This message may contain confidential information and is...{{dropped:14}} References 1. http://my.bw/ 2. http://my.bw/
ADD REPLY
0
Entering edit mode
On Mon, Feb 3, 2014 at 7:06 AM, Philip Lijnzaad <p.lijnzaad@umcutrecht.nl>wrote: > Hi Michael, > > thanks for the elaborate anwer. > > Note that this solution assumes that the data are on one chromosome. > > > hmm, it seems to work with the coverage being for all chromomes; I > iterate over the features (each of them contiguous and on one chromosome > only). > > > You want: > > scores <- coverage(coverage, weight="score") > > > OK, but the coverage was read from a bigwig file, so in this case it would > not work I think. > > > If you read it as a GRanges, then it should work. > scores[as(feature, "RangesList")] > > Make sure that the seqlevels are the same between "coverage" and > "feature". > > > OK, good point. > > > However, in this case fastest way is to have rtracklayer load the > coverage directly as an Rle. In release, this is just > > scores <- import("my.bw", asRle=TRUE) > scores[ as(feature, "RangesList") ] > > > I was actually been looking for something that, but this is not available > my version (R 3.0.1, Bioconductor 2.12, rtracklayer 1.20.4) > > > Time to update ;) > > If you are *only* interested in the data over "feature", then you could > tell rtracklayer to only load that data (the rest of the RleList will be > zeroed out). > > scores <- import("my.bw", asRle=TRUE, which=feature) > feature.scores <- scores[ as(feature, "RangesList") ] > > If "feature" is contained within one chromosome, then you might want to > simply: > unlist(feature.scores, use.names=FALSE) > > > OK > > > With all that explained, it would be nice to know your use case. Often > these types of problems are solvable with RleViews[List]. > > > Well, I was interested in the behaviour of protein-DNA binding data (for > which I have external bigwig files) over the promoters of a certain set of > genes. I would have hoped there to be a clear-cut example on the interwebs. > I guess there is one now (i.e. this thread :-), but still it would be nice > to be in one of the vignettes. > > To clarify, I'm interested in what you're doing downstream of getting the numbers. Are you summarizing it, e.g., by taking the average coverage? It so, RleViews is what you want. > Thanks again, > > Philip > > ------------------------------ > > * De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is > uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender > direct te informeren door het bericht te retourneren. Het Universitair > Medisch Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin > van de W.H.W. (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat > geregistreerd bij de Kamer van Koophandel voor Midden-Nederland onder nr. > 30244197. * > > * Denk s.v.p aan het milieu voor u deze e-mail afdrukt. * > ------------------------------ > > * This message may contain confidential information and is intended > exclusively for the addressee. If you receive this message unintentionally, > please do not use the contents but notify the sender immediately by return > e-mail. University Medical Center Utrecht is a legal person by public law > and is registered at the Chamber of Commerce for Midden-Nederland under no. > 30244197. * > > * Please consider the environment before printing this e-mail. * > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6