Often I read in coverage from a file and the lengths of the coverage object are less than the seqlengths of the chromosomes. If I try to subset the coverage towards the end of the chromosome (say, with a GRanges), I get "subscript contains out-of-bounds ranges" errors.
Is there a function to extend the ends of the coverage with zeroes to the full length of the chromosome according to the seqlengths? This would be really useful. I think it's a fair assumption that the coverage should be zero if there was no coverage of that region in the file/object the coverage object was created from (this is after all the assumption used for the *start* of the chromosome). Padding the end of the chromosome with zeroes could be an option for the coverage function when seqlengths are known. Edit: this already exists in the 'width' argument to coverage, see Aaron's answer below!
I've written my own simple function to pad the ends of the coverage Rle for each chromosome, but I'm sure it can be improved.
pad_coverage <- function(cov){ pad <- seqlengths(cov)[seqlevels(cov)] - lengths(cov) newcov <- mapply(cov, pad, FUN = function(s, p){ Rle(values = c(runValue(s),0), lengths = c(runLength(s), p)) }) S4Vectors:::new_SimpleList_from_list("SimpleRleList", newcov) }
I had forgotten about the 'width' argument to coverage, thanks!
However, I still think it'd be useful to be able to extend / pad the coverage of an existing coverage object.
In the case where you have existing coverage, you can coerce it back to a GRanges and then call coverage again with the desired width argument (or set the seqlengths of the GRanges first):