Entering edit mode
Vincent,
I am cc'ing the Bioconductor mailing list with the response to your
e-mail so others can access it.
The gaps,GRanges-method is strand specific, so when you pass a GRanges
object containing the transcript bounds into gaps, you will get the
gaps
on the positive strand, the negative strand, and the both strand
category. This is why you found a gap between transcripts on the
positive strand, while there was a transcript at some of those same
positions on the negative strand. If you want a strandless
representation of the transcripts, I recommend using the IRanges and
IRangesList classes. Here is code that achieves what I think you are
looking for:
> library(GenomicFeatures)
> txdb <- makeTranscriptDbFromUCSC(genome = "hg18", tablename =
"ensGene")
> tx <- transcripts(txdb)
> unstrndTx <- split(ranges(tx), seqnames(tx))
> unstrndGaps <- unlist(gaps(unstrndTx))
> intAlt <- GRanges(seqnames = factor(names(unstrndGaps),
names(seqlengths(tx))),
+ ranges = unname(unstrndGaps),
+ seqlengths = seqlengths(tx))
> intAlt[1:3]
GRanges with 3 ranges and 0 elementMetadata values
seqnames ranges strand |
<rle> <iranges> <rle> |
[1] chr1 [ 3534, 4273] * |
[2] chr1 [19670, 20228] * |
[3] chr1 [20367, 24416] * |
seqlengths
chr1 chr1_random chr10 ... chrX_random
chrY
247249719 1663265 135374737 ... 1719168
57772954
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-07-01 r52425)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.1.6 GenomicRanges_1.1.16 IRanges_1.7.11
loaded via a namespace (and not attached):
[1] Biobase_2.9.0 biomaRt_2.5.1 Biostrings_2.17.24
BSgenome_1.17.5
[5] DBI_0.2-5 RCurl_1.4-2 RSQLite_0.9-1
rtracklayer_1.9.3
[9] tools_2.12.0 XML_3.1-0
On 7/16/10 6:23 AM, Vincent Detours wrote:
> Dear Patrick,
>
> I am learning your packages genomicranges and transcriptdb, which
> I find impressively efficients. Here are two questions and perhaps a
> request.
>
> In transcriptdb, I get intergenic regions with:
> -----
> > txdb <- makeTranscriptDbFromUCSC(genome = "hg18", tablename =
"ensGene")
> > tr <- transcripts(txdb)
> > int <- gaps(tr)
> > int[1:3]
> GRanges with 3 ranges and 0 elementMetadata values
> seqnames ranges strand |
> <rle> <iranges> <rle> |
> [1] chr1 [ 1, 1872] + |
> [2] chr1 [ 3534, 20228] + |
> [3] chr1 [20367, 42911] + |
> ...
> -----
> Now I type chr1:3,534-20,228 in the UCSC browser, hg18, and I see
> that there is a transcript within this interval:
> http://genome.ucsc.edu/cgi-
bin/hgc?hgsid=165451654&o=4273&t=19669&g=ensGene&i=ENST00000326632
> <http: genome.ucsc.edu="" cgi-="" bin="" hgc?hgsid="165451654&o=4273&t=19669&g=ensGene&i=ENST00000326632">
> This transcript is indeed in the tr object
> -------
> > which(elementMetadata(tr)[,"tx_name"]=="ENST00000326632")
> [1] 3880
> > tr[3880,]
> GRanges with 1 range and 2 elementMetadata values
> seqnames ranges strand | tx_id tx_name
> <rle> <iranges> <rle> | <integer> <character>
> [1] chr1 [4274, 19669] - | 1731 ENST00000326632
> >
> -------
> Is there a bug in 'gaps', or am I missing something about how it
works?
>
> I am using the development version of the packages.
>
> Thank you for your help and great software.
>
> Vincent Detours, Ph. D.
> http://homepages.ulb.ac.be/~vdetours/
> <http: homepages.ulb.ac.be="" %7evdetours=""/>
>
> IRIBHM - Université Libre de Bruxelles
> Bldg C, room C.4.116
> ULB, Campus Erasme, CP602
> 808 route de Lennik
> B-1070 Brussels
> Belgium
>
> Phone: +32-2-555 4220
> Fax: +32-2-555 4655
> Skype: vdetours
> E-mail: vdetours at ulb.ac.be <http: ulb.ac.be="">
>
>
>
[[alternative HTML version deleted]]