Entering edit mode
Janet Young
▴
740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi,
Is there an easy way to read in a bigWig file as Rle values
appropriately placed along the chromosomes? Specifically, I'm using
rtracklayer's import.bw to read in a mapability track I downloaded
from UCSC and I get a RangedData object, as expected. Mapability
scores are calculated as per base-pair basis, so it seems obvious to
me to covert the scores to Rle.
###
library(rtracklayer)
## built-in example
test.bw.file <- system.file("tests", "test.bw", package =
"rtracklayer")
bw <- import(test.bw.file, ranges = GenomicRanges::GRanges("chr19",
IRanges(1, 6e7)))
bw
RangedData with 9 rows and 1 value column across 1 space
space ranges | score
<factor> <iranges> | <numeric>
1 chr19 [59302001, 59302300] | -1.00
2 chr19 [59302301, 59302600] | -0.75
3 chr19 [59302601, 59302900] | -0.50
4 chr19 [59302901, 59303200] | -0.25
5 chr19 [59303201, 59303500] | 0.00
6 chr19 [59303501, 59303800] | 0.25
7 chr19 [59303801, 59304100] | 0.50
8 chr19 [59304101, 59304400] | 0.75
9 chr19 [59304401, 59304700] | 1.00
##my example using a whole-genome mapability file I got from
# http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMap
ability/
### for now just look at one small region, but eventually I'll read in
data for the whole genome
testRegion <- GRangesForUCSCGenome("hg18", "chr12", IRanges(57795963,
57797963))
mapabilityTrack_small <- import.bw("/home/jayoung/traskdata/public_da
tabases/UCSC/Mar2006/other_tracks/wgEncodeCrgMapabilityAlign40mer.bw",
selection = BigWigSelection(testRegion) )
mapabilityTrack_small
RangedData with 92 rows and 1 value column across 49 spaces
space ranges | score
<factor> <iranges> | <numeric>
1 chr12 [57795963, 57796961] | 1.0000000
2 chr12 [57796962, 57796963] | 0.5000000
3 chr12 [57796964, 57796969] | 1.0000000
4 chr12 [57796970, 57796971] | 0.5000000
5 chr12 [57796972, 57796984] | 1.0000000
### etc
head(width(mapabilityTrack_small),20)
# [1] 999 2 6 2 13 1 1 1 19 3 1 1 1 1 2 1
1 1 1
# [20] 4
####
A RangedData object with scores makes sense for some use cases but for
many others it seems like RleList would make more sense (thinking
about the types of data that will be encoded in bigWig files, like
mapability, where you get a per-base pair score).
I think I will be able to figure out a way to convert the RangedData
scores - it'll be something like this, but maybe also padding with NA
at the end to go the full length of each chromosome, and applying the
function over all chromosomes.
Rle( values=c(NA,bw$score), lengths=c( (start(bw)[1]-1), width(bw) ) )
# 'numeric' Rle of length 59304700 with 10 runs
# Lengths: 59302000 300 300 300 300 300
300 300 300 300
# Values : NA -1 -0.75 -0.5 -0.25 0
0.25 0.5 0.75 1
Then I'll be able to use seqselect on the Rle to get scores in regions
I'm interested in, average the scores per region, average scores at
each position in equivalent regions, etc, etc.
But I did wonder whether I'm missing a more obvious solution to
convert to Rle - is there already a function that'll do this for me?
If not, could I suggest that as an enhancement for import.bw? (am I
just being lazy?)
thanks,
Janet
-------------------------------------------------------------------
Dr. Janet Young
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.
tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung ...at... fhcrc.org
-------------------------------------------------------------------