Hi
I was trying to write out some tracks of averaged methylation data (represented as GenomicRanges objects) as bigWig files using export.bw, but for my simple 1Mbp window data (2,700 ranges), it produced a 2GB file!
Testing with the not-run example in the documentation, this holds as well. The example object, with only 9 ranges, produces a 268KB bigWig file:
test_path <- system.file("tests", package = "rtracklayer") test_bw <- file.path(test_path, "test.bw") ## GRanges ## Returns ranges with non-zero scores. gr <- import(test_bw) gr which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000))) import(test_bw, which = which) ## RleList ## Scores returned as an RleList is equivalent to the coverage. ## Best option when 'which' or 'selection' contain many small ranges. mini <- narrow(unlist(tile(which, 50)), 2) rle <- import(test_bw, which = mini, as = "RleList") rle ## NumericList ## The 'which' is stored as metadata: track <- import(test_bw, which = which, as = "NumericList") metadata(track) ## Not run: test_bw_out <- file.path(tempdir(), "test_out.bw") export(gr, test_bw_out) #Note that I had to modify this to use gr since test doesn't exist
I understand that there should be some overhead for indexing, but this seems excessive. Indeed, when I export the same object as .bedGraph, it comes out to 168B. When I convert that file to bigWig using bedGraphToBigWig from the Kent tools, it comes to around 19KB.
Similarly, for the 2,700-range object I have, the bedGraph file is only 108KB, while the bigWig from bedGraphToBigWig is 79KB, not 2GB.
I cannot imagine this is working as intended?
Ok, I tweaked things, see my answer.