Hi,
I would like to extract exon features for specific transcripts. I am using the "exons" function from GenomicFeatures to retrieve exon ranks corresponding to a transcript ID. However, this doesn't work for overlapping genes where exons are strictly identical. Below is one example, but I got the same result for different transcript IDs, and gtf files from various sources. So my question is: how can I extract a list of exon ranks for a specific transcript when genes are overlapping ?
Thank you very much in advance,
Emilie
> library("GenomicFeatures") > txdb2 <- makeTranscriptDbFromUCSC(genome="hg19", tablename = "ensGene") > exons(txdb2,vals=list(tx_name="ENST00000336097"),columns=c("gene_id","exon_rank")) GRanges object with 6 ranges and 2 metadata columns: seqnames ranges strand | gene_id exon_rank [1] chr17 [49230937, 49231023] + | ENSG00000239672 1 [2] chr17 [49231586, 49231805] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 2 [3] chr17 [49233012, 49233141] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 2,3 [4] chr17 [49237341, 49237442] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 3,4 [5] chr17 [49238521, 49238633] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 4,5 [6] chr17 [49239089, 49239422] + | ENSG00000239672 6 ------- seqinfo: 93 sequences (1 circular) from hg19 genome > exons(txdb2,vals=list(tx_name="ENST00000336097"),columns=c("exon_rank"))$exon_rank IntegerList of length 6 [[1]] 1 [[2]] 2 [[3]] 2 3 [[4]] 3 4 [[5]] 4 5 [[6]] 6 > sessionInfo() R version 3.1.3 (2015-03-09) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.18.6 AnnotationDbi_1.28.2 Biobase_2.26.0 [4] BSgenome.Hsapiens.UCSC.hg19_1.4.0 gridExtra_0.9.1 plyr_1.8.1 [7] ggplot2_1.0.1 BSgenome_1.34.1 rtracklayer_1.26.3 [10] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 Biostrings_2.34.1 [13] XVector_0.6.0 IRanges_2.0.1 S4Vectors_0.4.0 [16] BiocGenerics_0.12.1 RSQLite_1.0.0 DBI_0.3.1 loaded via a namespace (and not attached): [1] base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3 [5] biomaRt_2.22.0 bitops_1.0-6 brew_1.0-6 checkmate_1.5.2 [9] codetools_0.2-11 colorspace_1.2-6 digest_0.6.8 fail_1.2 [13] foreach_1.4.2 GenomicAlignments_1.2.2 gtable_0.1.2 iterators_1.0.7 [17] MASS_7.3-40 munsell_0.4.2 proto_0.3-10 Rcpp_0.11.5 [21] RCurl_1.95-4.5 reshape2_1.4.1 Rsamtools_1.18.3 scales_0.2.4 [25] sendmailR_1.2-1 stringr_0.6.2 tools_3.1.3 XML_3.98-1.1 [29] zlibbioc_1.12.0