import.bw to Rle?
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi, Is there an easy way to read in a bigWig file as Rle values appropriately placed along the chromosomes? Specifically, I'm using rtracklayer's import.bw to read in a mapability track I downloaded from UCSC and I get a RangedData object, as expected. Mapability scores are calculated as per base-pair basis, so it seems obvious to me to covert the scores to Rle. ### library(rtracklayer) ## built-in example test.bw.file <- system.file("tests", "test.bw", package = "rtracklayer") bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19", IRanges(1, 6e7))) bw RangedData with 9 rows and 1 value column across 1 space space ranges | score <factor> <iranges> | <numeric> 1 chr19 [59302001, 59302300] | -1.00 2 chr19 [59302301, 59302600] | -0.75 3 chr19 [59302601, 59302900] | -0.50 4 chr19 [59302901, 59303200] | -0.25 5 chr19 [59303201, 59303500] | 0.00 6 chr19 [59303501, 59303800] | 0.25 7 chr19 [59303801, 59304100] | 0.50 8 chr19 [59304101, 59304400] | 0.75 9 chr19 [59304401, 59304700] | 1.00 ##my example using a whole-genome mapability file I got from # http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMap ability/ ### for now just look at one small region, but eventually I'll read in data for the whole genome testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963, 57797963)) mapabilityTrack_small <- import.bw("/home/jayoung/traskdata/public_da tabases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw", selection = BigWigSelection(testRegion) ) mapabilityTrack_small RangedData with 92 rows and 1 value column across 49 spaces space ranges | score <factor> <iranges> | <numeric> 1 chr12 [57795963, 57796961] | 1.0000000 2 chr12 [57796962, 57796963] | 0.5000000 3 chr12 [57796964, 57796969] | 1.0000000 4 chr12 [57796970, 57796971] | 0.5000000 5 chr12 [57796972, 57796984] | 1.0000000 ### etc head(width(mapabilityTrack_small),20) # [1] 999 2 6 2 13 1 1 1 19 3 1 1 1 1 2 1 1 1 1 # [20] 4 #### A RangedData object with scores makes sense for some use cases but for many others it seems like RleList would make more sense (thinking about the types of data that will be encoded in bigWig files, like mapability, where you get a per-base pair score). I think I will be able to figure out a way to convert the RangedData scores - it'll be something like this, but maybe also padding with NA at the end to go the full length of each chromosome, and applying the function over all chromosomes. Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) ) # 'numeric' Rle of length 59304700 with 10 runs # Lengths: 59302000 300 300 300 300 300 300 300 300 300 # Values : NA -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 Then I'll be able to use seqselect on the Rle to get scores in regions I'm interested in, average the scores per region, average scores at each position in equivalent regions, etc, etc. But I did wonder whether I'm missing a more obvious solution to convert to Rle - is there already a function that'll do this for me? If not, could I suggest that as an enhancement for import.bw? (am I just being lazy?) thanks, Janet ------------------------------------------------------------------- Dr. Janet Young Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org -------------------------------------------------------------------
GO Cancer convert GO Cancer convert • 1.8k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Fri, Aug 12, 2011 at 6:13 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi, > > Is there an easy way to read in a bigWig file as Rle values appropriately > placed along the chromosomes? Specifically, I'm using rtracklayer's > import.bw to read in a mapability track I downloaded from UCSC and I get a > RangedData object, as expected. Mapability scores are calculated as per > base-pair basis, so it seems obvious to me to covert the scores to Rle. > > ### > library(rtracklayer) > > ## built-in example > test.bw.file <- system.file("tests", "test.bw", package = "rtracklayer") > bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19", > IRanges(1, 6e7))) > bw > RangedData with 9 rows and 1 value column across 1 space > space ranges | score > <factor> <iranges> | <numeric> > 1 chr19 [59302001, 59302300] | -1.00 > 2 chr19 [59302301, 59302600] | -0.75 > 3 chr19 [59302601, 59302900] | -0.50 > 4 chr19 [59302901, 59303200] | -0.25 > 5 chr19 [59303201, 59303500] | 0.00 > 6 chr19 [59303501, 59303800] | 0.25 > 7 chr19 [59303801, 59304100] | 0.50 > 8 chr19 [59304101, 59304400] | 0.75 > 9 chr19 [59304401, 59304700] | 1.00 > > ##my example using a whole-genome mapability file I got from > # > http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMap ability/ > > ### for now just look at one small region, but eventually I'll read in data > for the whole genome > testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963, > 57797963)) > mapabilityTrack_small <- import.bw("/home/jayoung/traskdata/public_ databases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw ", > selection = BigWigSelection(testRegion) ) > > mapabilityTrack_small > RangedData with 92 rows and 1 value column across 49 spaces > space ranges | score > <factor> <iranges> | <numeric> > 1 chr12 [57795963, 57796961] | 1.0000000 > 2 chr12 [57796962, 57796963] | 0.5000000 > 3 chr12 [57796964, 57796969] | 1.0000000 > 4 chr12 [57796970, 57796971] | 0.5000000 > 5 chr12 [57796972, 57796984] | 1.0000000 > ### etc > > head(width(mapabilityTrack_small),20) > # [1] 999 2 6 2 13 1 1 1 19 3 1 1 1 1 2 1 1 > 1 1 > # [20] 4 > > #### > > A RangedData object with scores makes sense for some use cases but for many > others it seems like RleList would make more sense (thinking about the types > of data that will be encoded in bigWig files, like mapability, where you get > a per-base pair score). > > I think I will be able to figure out a way to convert the RangedData scores > - it'll be something like this, but maybe also padding with NA at the end to > go the full length of each chromosome, and applying the function over all > chromosomes. > > Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) ) > # 'numeric' Rle of length 59304700 with 10 runs > # Lengths: 59302000 300 300 300 300 300 300 > 300 300 300 > # Values : NA -1 -0.75 -0.5 -0.25 0 0.25 > 0.5 0.75 1 > > Then I'll be able to use seqselect on the Rle to get scores in regions I'm > interested in, average the scores per region, average scores at each > position in equivalent regions, etc, etc. > > But I did wonder whether I'm missing a more obvious solution to convert to > Rle - is there already a function that'll do this for me? If not, could I > suggest that as an enhancement for import.bw? (am I just being lazy?) > > The unobvious solution is to call coverage(rd, weight = "score"). Michael > thanks, > > Janet > > ------------------------------------------------------------------- > > Dr. Janet Young > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > > ------------------------------------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
thanks very much, Michael - that'll do it. Janet On Aug 14, 2011, at 6:09 PM, Michael Lawrence wrote: > > > On Fri, Aug 12, 2011 at 6:13 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: > Hi, > > Is there an easy way to read in a bigWig file as Rle values appropriately placed along the chromosomes? Specifically, I'm using rtracklayer's import.bw to read in a mapability track I downloaded from UCSC and I get a RangedData object, as expected. Mapability scores are calculated as per base-pair basis, so it seems obvious to me to covert the scores to Rle. > > ### > library(rtracklayer) > > ## built-in example > test.bw.file <- system.file("tests", "test.bw", package = "rtracklayer") > bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19", IRanges(1, 6e7))) > bw > RangedData with 9 rows and 1 value column across 1 space > space ranges | score > <factor> <iranges> | <numeric> > 1 chr19 [59302001, 59302300] | -1.00 > 2 chr19 [59302301, 59302600] | -0.75 > 3 chr19 [59302601, 59302900] | -0.50 > 4 chr19 [59302901, 59303200] | -0.25 > 5 chr19 [59303201, 59303500] | 0.00 > 6 chr19 [59303501, 59303800] | 0.25 > 7 chr19 [59303801, 59304100] | 0.50 > 8 chr19 [59304101, 59304400] | 0.75 > 9 chr19 [59304401, 59304700] | 1.00 > > ##my example using a whole-genome mapability file I got from > # http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeM apability/ > > ### for now just look at one small region, but eventually I'll read in data for the whole genome > testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963, 57797963)) > mapabilityTrack_small <- import.bw("/home/jayoung/traskdata/public_ databases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw ", selection = BigWigSelection(testRegion) ) > > mapabilityTrack_small > RangedData with 92 rows and 1 value column across 49 spaces > space ranges | score > <factor> <iranges> | <numeric> > 1 chr12 [57795963, 57796961] | 1.0000000 > 2 chr12 [57796962, 57796963] | 0.5000000 > 3 chr12 [57796964, 57796969] | 1.0000000 > 4 chr12 [57796970, 57796971] | 0.5000000 > 5 chr12 [57796972, 57796984] | 1.0000000 > ### etc > > head(width(mapabilityTrack_small),20) > # [1] 999 2 6 2 13 1 1 1 19 3 1 1 1 1 2 1 1 1 1 > # [20] 4 > > #### > > A RangedData object with scores makes sense for some use cases but for many others it seems like RleList would make more sense (thinking about the types of data that will be encoded in bigWig files, like mapability, where you get a per-base pair score). > > I think I will be able to figure out a way to convert the RangedData scores - it'll be something like this, but maybe also padding with NA at the end to go the full length of each chromosome, and applying the function over all chromosomes. > > Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) ) > # 'numeric' Rle of length 59304700 with 10 runs > # Lengths: 59302000 300 300 300 300 300 300 300 300 300 > # Values : NA -1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 > > Then I'll be able to use seqselect on the Rle to get scores in regions I'm interested in, average the scores per region, average scores at each position in equivalent regions, etc, etc. > > But I did wonder whether I'm missing a more obvious solution to convert to Rle - is there already a function that'll do this for me? If not, could I suggest that as an enhancement for import.bw? (am I just being lazy?) > > > > The unobvious solution is to call coverage(rd, weight = "score"). > > Michael > > > thanks, > > Janet > > ------------------------------------------------------------------- > > Dr. Janet Young > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > > ------------------------------------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6