Entering edit mode
On Wed, Oct 26, 2011 at 3:07 PM, Martin Morgan <mtmorgan@fhcrc.org>
wrote:
> On 10/25/2011 05:35 PM, Robert Castelo wrote:
>
>> dear list,
>>
>> the following three lines allow one to count overlaps of aligned
>> short-reads with annotations:
>>
>> aln <- readGappedAlignments("**somebamfile.bam")
>> txdb <- makeTranscriptFromUCSC(genome=**"hg19",
tablename="ensGene")
>> ensGenes <- exonsBy(txdb, by="gene")
>> ov <- countGenomicOverlaps(aln, ensGenes)
>>
>> then i want to get read-counts per gene and the first thing that
comes
>> to my head is doing:
>>
>> counts <- sapply(ov, function(x) sum(values(x)[["hits"]]))
>>
>> which goes through every gene and adds up the "hits" of its exons.
>> however, this latter step of "just adding" takes longer than the
actual
>> calculation of the hits with countGenomicOverlaps() and i guess
that
>> there are more efficient ways to approach this, probably something
>> around "reducing the hits value column". i've been looking at
rdapply()
>> and reduce() and googled too, but couldn't find anything, so i look
>> forward to your suggestions.
>>
>
> Hi Robert -- A strategy here is along the lines (untested!) of
>
> hits <- values(unlist(ov)))[["hits"]]
>
At least in devel, we can now do this:
counts <- sum(relist(hits, ov))
> genes <- rep(names(ov), elementLengths(ov))
> counts <- sapply(split(hits, genes), sum)
>
>
but you'll want to make sure that this is a conceptually sensible way
of
> counting hits per gene.
>
> The countGenomicOverlaps function has been heavily updated to
> summarizeOverlaps in 'devel'; you might want to check that out, or
at least
> the vignette for summarizeOverlaps at
>
> http://bioconductor.org/**packages/devel/bioc/html/**GenomicRanges.h
tml<http: bioconductor.org="" packages="" devel="" bioc="" html="" genomicranges.htm="" l="">
>
> Martin
>
>
>
>> thanks!!
>> robert.
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives:
>> http://news.gmane.org/gmane.**science.biology.informatics.**conduct
or<http: news.gmane.org="" gmane.science.biology.informatics.conductor="">
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
>
> ______________________________**_________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor="">
> Search the archives: http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor="">
>
[[alternative HTML version deleted]]