Entering edit mode
alessandro.pastore
▴
20
@alessandropastore-10879
Last seen 6.1 years ago
I would like to generate a GRangesList of all gene introns with names. I can make the exon list but I do not see a elegant way do get the introns. any suggestion?
Thanks!
library(AnnotationHub) edb <- query(AnnotationHub(), c("Ensembl 90 EnsDb", "Homo sapiens"))[[1]] exons.Grange <- exons(edb, columns = c(listColumns(edb , "tx"), "gene_name")) exons.Grange <- exons.Grange[duplicated(exons.Grange$exon_id),] exons.Grange <- split(exons.Grange, exons.Grange$exon_id)
> exons.Grange GRangesList object of length 221795: $ENSE00000327880 GRanges object with 5 ranges and 11 metadata columns: seqnames ranges strand | tx_id tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start tx_cds_seq_end <Rle> <IRanges> <Rle> | <character> <character> <integer> <integer> <integer> <integer> ENSE00000327880 1 [27732603, 27732657] + | ENST00000419687 protein_coding 27725996 27761473 27726081 27760581 ENSE00000327880 1 [27732603, 27732657] + | ENST00000530324 protein_coding 27726028 27759764 27726081 27759657 ENSE00000327880 1 [27732603, 27732657] + | ENST00000234549 protein_coding 27726028 27760581 27726081 27760581 ENSE00000327880 1 [27732603, 27732657] + | ENST00000373949 protein_coding 27726028 27761964 27726081 27760581 ENSE00000327880 1 [27732603, 27732657] + | ENST00000010299 protein_coding 27726057 27760581 27726081 27760581 gene_id tx_support_level tx_name gene_name exon_id <character> <integer> <character> <character> <character> ENSE00000327880 ENSG00000009780 2 ENST00000419687 FAM76A ENSE00000327880 ENSE00000327880 ENSG00000009780 1 ENST00000530324 FAM76A ENSE00000327880 ENSE00000327880 ENSG00000009780 1 ENST00000234549 FAM76A ENSE00000327880 ENSE00000327880 ENSG00000009780 2 ENST00000373949 FAM76A ENSE00000327880 ENSE00000327880 ENSG00000009780 1 ENST00000010299 FAM76A ENSE00000327880 $ENSE00000328922 GRanges object with 2 ranges and 11 metadata columns: seqnames ranges strand | tx_id tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start ENSE00000328922 3 [131018506, 131018716] - | ENST00000264992 protein_coding 131013875 131026802 131014057 ENSE00000328922 3 [131018506, 131018716] - | ENST00000507978 nonsense_mediated_decay 131013982 131026854 131017000 tx_cds_seq_end gene_id tx_support_level tx_name gene_name exon_id ENSE00000328922 131025306 ENSG00000034533 1 ENST00000264992 ASTE1 ENSE00000328922 ENSE00000328922 131025306 ENSG00000034533 2 ENST00000507978 ASTE1 ENSE00000328922 $ENSE00000329326 GRanges object with 2 ranges and 11 metadata columns: seqnames ranges strand | tx_id tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start tx_cds_seq_end ENSE00000329326 8 [132583694, 132583779] - | ENST00000250173 protein_coding 132572201 132675559 132578498 132675493 ENSE00000329326 8 [132583694, 132583779] - | ENST00000618342 protein_coding 132571953 132661667 132572306 132661667 gene_id tx_support_level tx_name gene_name exon_id ENSE00000329326 ENSG00000129295 1 ENST00000250173 LRRC6 ENSE00000329326 ENSE00000329326 ENSG00000129295 5 ENST00000618342 LRRC6 ENSE00000329326 ... <221792 more elements> ------- seqinfo: 388 sequences from GRCh38 genome
I'd say your approach seems to be pretty OK. There is no intron ID stored in the database, so you can't get that from an EnsDb.
Thanks ! I thought it would be nice to keep some kind of mcols information...