Question

TranscriptDb and genomicranges questions

0

Entering edit mode

Patrick Aboyoun ★ 1.6k

@patrick-aboyoun-6734

Last seen 10.4 years ago

United States

Vincent, I am cc'ing the Bioconductor mailing list with the response to your e-mail so others can access it. The gaps,GRanges-method is strand specific, so when you pass a GRanges object containing the transcript bounds into gaps, you will get the gaps on the positive strand, the negative strand, and the both strand category. This is why you found a gap between transcripts on the positive strand, while there was a transcript at some of those same positions on the negative strand. If you want a strandless representation of the transcripts, I recommend using the IRanges and IRangesList classes. Here is code that achieves what I think you are looking for: > library(GenomicFeatures) > txdb <- makeTranscriptDbFromUCSC(genome = "hg18", tablename = "ensGene") > tx <- transcripts(txdb) > unstrndTx <- split(ranges(tx), seqnames(tx)) > unstrndGaps <- unlist(gaps(unstrndTx)) > intAlt <- GRanges(seqnames = factor(names(unstrndGaps), names(seqlengths(tx))), + ranges = unname(unstrndGaps), + seqlengths = seqlengths(tx)) > intAlt[1:3] GRanges with 3 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr1 [ 3534, 4273] * | [2] chr1 [19670, 20228] * | [3] chr1 [20367, 24416] * | seqlengths chr1 chr1_random chr10 ... chrX_random chrY 247249719 1663265 135374737 ... 1719168 57772954 > sessionInfo() R version 2.12.0 Under development (unstable) (2010-07-01 r52425) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.1.6 GenomicRanges_1.1.16 IRanges_1.7.11 loaded via a namespace (and not attached): [1] Biobase_2.9.0 biomaRt_2.5.1 Biostrings_2.17.24 BSgenome_1.17.5 [5] DBI_0.2-5 RCurl_1.4-2 RSQLite_0.9-1 rtracklayer_1.9.3 [9] tools_2.12.0 XML_3.1-0 On 7/16/10 6:23 AM, Vincent Detours wrote: > Dear Patrick, > > I am learning your packages genomicranges and transcriptdb, which > I find impressively efficients. Here are two questions and perhaps a > request. > > In transcriptdb, I get intergenic regions with: > ----- > > txdb <- makeTranscriptDbFromUCSC(genome = "hg18", tablename = "ensGene") > > tr <- transcripts(txdb) > > int <- gaps(tr) > > int[1:3] > GRanges with 3 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] chr1 [ 1, 1872] + | > [2] chr1 [ 3534, 20228] + | > [3] chr1 [20367, 42911] + | > ... > ----- > Now I type chr1:3,534-20,228 in the UCSC browser, hg18, and I see > that there is a transcript within this interval: > http://genome.ucsc.edu/cgi- bin/hgc?hgsid=165451654&o=4273&t=19669&g=ensGene&i=ENST00000326632 > <http: genome.ucsc.edu="" cgi-="" bin="" hgc?hgsid="165451654&o=4273&t=19669&g=ensGene&i=ENST00000326632"> > This transcript is indeed in the tr object > ------- > > which(elementMetadata(tr)[,"tx_name"]=="ENST00000326632") > [1] 3880 > > tr[3880,] > GRanges with 1 range and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr1 [4274, 19669] - | 1731 ENST00000326632 > > > ------- > Is there a bug in 'gaps', or am I missing something about how it works? > > I am using the development version of the packages. > > Thank you for your help and great software. > > Vincent Detours, Ph. D. > http://homepages.ulb.ac.be/~vdetours/ > <http: homepages.ulb.ac.be="" %7evdetours=""/> > > IRIBHM - Université Libre de Bruxelles > Bldg C, room C.4.116 > ULB, Campus Erasme, CP602 > 808 route de Lennik > B-1070 Brussels > Belgium > > Phone: +32-2-555 4220 > Fax: +32-2-555 4655 > Skype: vdetours > E-mail: vdetours at ulb.ac.be <http: ulb.ac.be=""> > > > [[alternative HTML version deleted]]

TranscriptDb IRanges GenomicRanges TranscriptDb IRanges GenomicRanges • 1.1k views

ADD COMMENT • link 14.6 years ago Patrick Aboyoun ★ 1.6k