Entering edit mode
Fong Chun Chan
▴
320
@fong-chun-chan-5706
Last seen 10.2 years ago
Hi Valarie,
Thanks for the reply. I was testing this and it doesn't seem to work
with
the summarizeOverlaps() function. When I try it, I get this error:
countsForIntrons <- summarizeOverlaps(intronsByGene, reads);
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function summarizeOverlaps
for
signature "GRangesList", "GRanges"
Is it supposed to be countOverlaps() instead of summarizeOverlap()?
Thanks,
Fong
On Wed, Jan 30, 2013 at 12:23 PM, Valerie Obenchain
<vobencha@fhcrc.org>wrote:
> Hi Fong,
>
> We don't have a single function call to get introns by gene but you
can
> put one together in a few steps.
>
> library(TxDb.Hsapiens.UCSC.**hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.**knownGene
>
> Specifying use.names = TRUE puts the transcript names on the
GRangesList.
>
> introns <- intronsByTranscript(txdb, use.names=TRUE)
>
> Unlist and remove duplicate introns.
>
> ulst <- unlist(introns)
> intronsNoDups <- ulst[!duplicated(ulst)]
>
> The select() function can map txid to geneid (and others). See
?select.
>
> txnames <- names(intronsNoDups)
> map <- select(txdb, keys=txnames, keytype='TXNAME',
cols='GENEID')
>
> Not all transcripts are associated with a gene id and some are
associated
> with more than one.
>
> > head(map, 10)
> TXNAME GENEID
> 1 uc001aaa.3 <na>
> 2 uc001aaa.3 <na>
> 3 uc010nxq.1 <na>
> 4 uc010nxq.1 <na>
> 5 uc010nxr.1 <na>
> 6 uc010nxr.1 <na>
> 7 uc009vjk.2 100133331
> 8 uc009vjk.2 100133331
> 9 uc001aau.3 100132287
> 10 uc021oeh.1 100133331
>
>
> To get a GRangesList of introns by gene, split the introns on the
> associated gene id.
>
> idx <- map$GENEID[!is.na(map$GENEID)]
> intronsByGene <- split(intronsNoDups[!is.na(**map$GENEID)], idx)
> names(intronsByGene) <- unique(idx)
>
>
> The list names are geneid and the element rownames are txid.
>
> > intronsByGene
> GRangesList of length 19784:
> $100133331
> GRanges with 13 ranges and 0 metadata columns:
> seqnames ranges strand
> <rle> <iranges> <rle>
> uc002qsd.4 chr19 [58858396, 58858718] -
> uc002qsd.4 chr19 [58859007, 58861735] -
> uc002qsd.4 chr19 [58862018, 58862756] -
> ...
>
> $100132287
> GRanges with 1 range and 0 metadata columns:
> seqnames ranges strand
> uc003wyw.1 chr8 [18248856, 18257507] +
>
>
> To count reads against this annotation you can use
summarizeOverlaps().
> The 'reads' can be Bam files, GappedAlignments or
GappedAlignmentPairs. The
> 'features' argument will be the GRangesList of introns by gene. The
result
> will be counts per gene.
>
> To count just introns with no grouping you can still use
> summarizeOverlaps() but use the 'intronsNoDups' GRanges object as
> 'features'.
>
>
> Valerie
>
>
>
> On 01/29/2013 11:38 AM, Fong Chun Chan wrote:
>
>> Hi,
>>
>> I am using the R package, Genomic Features, to get Intron
Expression and
>> the function that I am using is the intronsByTranscript(). While
this
>> function is useful, it returns the number of raw reads that align
to each
>> intron grouped at the transcript level. Is there an easy way to get
it to
>> group it by a gene such that I am grabbing all the introns that
fall in
>> all
>> the transcripts belong to a gene (and without double counting the
intronic
>> space).
>>
>> Similar, is there a way to easily get the raw read count at an
individual
>> intron level rather than having it grouped? The introns() function
seems
>> to
>> be defunct now as it recommends that we use the
intronsByTranscript()
>> function.
>>
>> Thanks,
>>
>> Fong
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor="">
>>
>
>
[[alternative HTML version deleted]]
Hi Valerie,
I am trying to do the same analysis as Fong i.e. count introns. I made a GRangesList of introns by gene as you described and also made a GRangesList of exons by gene using "exonsByGene <- exonsBy(txdb, by = c("gene"))." I wanted to test out counting using summarizeOverlaps and so I tested it out by counting exonsByGene and comparing the results I got with the count results obtained from HTSeq-count. My results are different despite using the same .gtf annotation file. The command I used for summarizeOverlaps is:
exon_counts <- summarizeOverlaps(exonsByGene, reads = pathtoFiles, singleEnd = FALSE, ignore.strand = FALSE, fragments=FALSE)
I also tried using the option ignore.strand=TRUE.
and my HTSeq-counts command is:
htseq-count -m union -s reverse -i gene_id -f bam -r pos pathtobamfile annotation.gtf > outputfile
How can I setup summarizeOverlaps so that the read counts result is the same as the HTSeq-count result?
I apologize for latching onto this thread, I didn't know if proper etiquette requires to ask a new question or just follow up on a pre-existing one.