I have a wig file that I've read into a granges-like object using a function that I wrote that calls the `rtracklayer` package:
read_wig <- function(x, format='wig', genome='mm9') { suppressMessages(library(rtracklayer)) merged_wig <- import.wig(x, format=format, genome=genome) merged_wig <- keepSeqlevels(merged_wig, paste0('chr', c(seq(1,19), 'X', 'Y')), pruning.mode="coarse") return(merged_wig) } wig <- read_wig('~/path/to/wig')
The code above returns:
> wig UCSC track 'MEFES_K27AC.downsampled.sorted' UCSCData object with 13274466 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> [1] chr1 [ 1, 200] * | 0 [2] chr1 [201, 400] * | 0 [3] chr1 [401, 600] * | 0 [4] chr1 [601, 800] * | 0 [5] chr1 [801, 1000] * | 0 ... ... ... ... . ... [13274462] chrY [15901401, 15901600] * | 0 [13274463] chrY [15901601, 15901800] * | 0 [13274464] chrY [15901801, 15902000] * | 0 [13274465] chrY [15902001, 15902200] * | 0 [13274466] chrY [15902201, 15902400] * | 0 ------- seqinfo: 21 sequences from mm9 genome
Now with this object **I'd like to compute the sum of scores within a window around each range for each row in the object**. For example, I'd like to compute the sum of score between ranges 1-10000 (123 for this example) and add this entry as a column next to score. I'd like to do this for each row.
> expected_output UCSC track 'MEFES_K27AC.downsampled.sorted' UCSCData object with 13274466 ranges and 1 metadata column: seqnames ranges strand | score score_10000 <Rle> <IRanges> <Rle> | <numeric> <numeric> [1] chr1 [ 1, 200] * | 0 123 [2] chr1 [201, 400] * | 0 ... [3] chr1 [401, 600] * | 0 ... [4] chr1 [601, 800] * | 0 ... [5] chr1 [801, 1000] * | 0 ... ... ... ... ... . ... [13274462] chrY [15901401, 15901600] * | 0 ... [13274463] chrY [15901601, 15901800] * | 0 ... [13274464] chrY [15901801, 15902000] * | 0 ... [13274465] chrY [15902001, 15902200] * | 0 ... [13274466] chrY [15902201, 15902400] * | 0 ... ------- seqinfo: 21 sequences from mm9 genome
Ideally I'd like to add columns that compute score ranges from 1-10000, 1-20000, 1-30000, etc. up to 100000.
Any help would be much appreciated!
EDIT: I've added a wig file here that can be used to run the code above.
Needs some clarification. What sort of window do you want around each range? 10kb, centered on the midpoint of the range, or what? Btw,
keepStandardChromosomes()
might be helpful.Hi and thank you for your reply. I would like a window to begin at the start of the range and go 10000 upstream. So for the first row the range is 1-200 and I want to compute the sum of scores between 1-10000 and add this number to the metadata columns. The second row range is 201-400 so I want to compute the sum of scores between 201-10201. Ideally I'd like to add multiple columns that sum the scores for each range from 1-10000, 1-20000, 1-30000, up to 100000. Does that help out?
Hi Michael -- I'm thinking more about this and I think it might be best to center on the midpoint of the range for each row and then compute a sum forward and backwards of the midpoint. For example, for a sum of scores within a 10000 window, we look 5000 upstream and 5000 downstream. Does that make sense? Any help would be greatly appreciated!
I've just added an example wig file that can be used to run the code in the original post.