Entering edit mode
Janet Young
▴
740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi again,
I have another question, in the "I think I need a convoluted
workaround, but maybe I'm missing a simple solution" genre (seems like
most of my questions are like that).
I have an RleList object representing mapability for the whole human
genome. I'd like to:
(a) calculate viewMeans for various regions of interest that I've been
representing as GRanges, and
(b) I'd like the underlying code to be smart and match chromosome
names up in the RleList and the GRanges object (not rely on
chromosomes being ordered the same in the two objects), and
(c) I'd like to return the viewMeans results in the same order as the
objects in my original GRanges
I don't think this is currently implemented without doing several
coercions (that introduce their own complications) but I'm not sure.
Some code is below that shows what I'm trying to do. It seems like
it might be a common kind of way to use viewMeans - I imagine people
are gradually switching to use GRanges more and more? but really I
don't know.
My real world question is that I've read in mapability from a UCSC
bigWig file, and made that into an RleList. I have a bunch of other
regions I've read in (again from UCSC) using rtracklayer as GRanges,
and have annotated those with various things I'm interested in (e.g.
number of RNAseq reads in the region). I want to add the average
mapability for each region, so that I can later look at how mapability
affects those other things I'm annotating.
Should I be being more strict with myself about how my GRanges are
ordered and making sure they all have the same seqlevels and
seqlengths? Perhaps that would help the coercion workarounds go more
smoothly.
thanks,
Janet
library(GenomicRanges)
### make an RleList. Just for this example, I starting with a GRanges
object and used coverage to get an RleList example. To get my real
data I read in a UCSC bigWig file.
fakeRegions <- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"),
ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) )
seqlengths(fakeRegions) <- c(100,100,100)
myRleList <- coverage(fakeRegions)
rm(fakeRegions)
myRleList
# SimpleRleList of length 3
# $chrA
# 'integer' Rle of length 100 with 4 runs
# Lengths: 50 10 30 10
# Values : 1 2 1 0
#
# $chrB
# 'integer' Rle of length 100 with 2 runs
# Lengths: 20 80
# Values : 1 0
#
# $chrC
# 'integer' Rle of length 100 with 2 runs
# Lengths: 10 90
# Values : 1 0
### make some regions of interest
myRegions <- GRanges(seqnames=c("chrB", "chrC", "chrB"),
ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ),
geneNames=c("g1","g2","g3") )
myRegions
# GRanges with 3 ranges and 1 elementMetadata value
# seqnames ranges strand | geneNames
# <rle> <iranges> <rle> | <character>
# [1] chrB [1, 20] * | g1
# [2] chrC [1, 20] * | g2
# [3] chrB [5, 10] * | g3
#
# seqlengths
# chrB chrC
# NA NA
## can't use GRanges directly
Views( myRleList, myRegions)
# Error in RleViewsList(rleList = subject, rangesList = start) :
# 'rangesList' must be a RangesList object
## can't use a simple coercion
Views( myRleList, as(myRegions,"RangesList") )
# Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY =
SIMPLIFY, :
# all object lengths must be multiple of longest object length
### although I can use that coercion if I first give the GRanges
object the same levels as the RleList to force the lists to have the
same names as each other:
seqlevels(myRegions) <- names(myRleList)
viewMeans(Views( myRleList, as(myRegions,"RangesList") ) )
# SimpleNumericList of length 3
# [["chrA"]] numeric(0)
# [["chrB"]] 1 1
# [["chrC"]] 0.5
### getting close to a solution - but I'd have liked to have been able
to unlist this and directly add to my GRanges object e.g.
values(myRegions)$meanCov <- unlist(viewMeans(Views( myRleList,
as(myRegions,"RangesList") ) ), use.names=FALSE)
### but the regions are in a different order to how I started, so the
above command would associate the wrong scores with the wrong regions.
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.4.6 IRanges_1.10.5