Getting a list of all tRNA encoding genes for an organism?
3
0
Entering edit mode
Keith Hughitt ▴ 180
@keith-hughitt-6740
Last seen 9 months ago
United States

Hello,

Does anyone know of a way to get a list of ENSEMBL gene identifiers for all known tRNA encoding genes for a given species?

At the moment, I am interested in generating such lists for human and mouse.

For other types of genes (rRNAs, snoRNAs, etc.) I am able to use biomaRt to find all such genes, for example:

library(biomaRt)
ensembl_mart = useMart(biomart="ensembl")
biomart = useDataset('hsapiens_gene_ensembl', mart=ensembl_mart)

biomart_genes = getBM(attributes=c("ensembl_gene_id", "gene_biotype"), mart=biomart)
my_genes$type = biomart_genes$gene_biotype[match(my_genes$ENSEMBL,         biomart_genes$ensembl_gene_id)]

 

...where "my_genes" is some dataframe of genes (e.g. a count table) with an id field called "ENSEMBL"

tRNAs do not have their own gene_biotype grouping, and therefor a separate approach is needed to find them.

I stumbled across Marc's FDB.UCSC.tRNAs database package and was able to figure out how to at least get list of tRNA genes:

library('FDb.UCSC.tRNAs')
names(features(FDb.Hsapiens.UCSC.hg19.tRNAs))

However, I'm not sure how to map from these entries back to my ENSEMBL gene ids. 

Anyone know of a better way?

 

Keith

trna ncrna genomicfeatures featuredb biomart • 4.8k views
ADD COMMENT
0
Entering edit mode

Hi Keith

I was in a similar situation a few months ago, when I was working with the ensembl gtf files (for mouse and human), which only contain the mitochondrial tRNA genes. I approached the ensembl helpdesk, and was adviced to use the Perl API to get them.

Hans-Rudolf

ADD REPLY
0
Entering edit mode

Hi Hans-Rudolf,

Thanks for the suggestion -- I will check out the Perl API.

Did you have any luck using it to query tRNAs?

ADD REPLY
0
Entering edit mode

yes, it worked very well for my needs. The trick was:

$slice->get_all_SimpleFeatures('tRNAscan') }

 

Hans-Rudolf

ADD REPLY
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 4 weeks ago
Italy

Hi Keith!

You could try to use the Ensembl annotation packages (e.g. EnsDb.Hsapiens.v75) that I have submitted to Bioconductor and are currently in Bioc-devel (should be available in Bioconductor's next release). Basically, I'm using the perl Ensembl API to get the gene/transcript models defined in Ensembl and store that along some additional information (i.e. gene name, gene biotype, transcript biotype) in an SQLite database included in the above mentioned annotation packages. To get a list of all Ensembl gene ids:

library(EnsDb.Hsapiens.v79)

## you could use the listGenebiotypes to get an overview of all available
## gene biotypes in Ensembl
listGenebiotypes(EnsDb.Hsapiens.v79)

## get all genes with gene biotype "Mt_tRNA"... seems to be the only 
## tRNA biotype in Ensembl
genes(EnsDb.Hsapiens.v79, filter=list(GenebiotypeFilter("Mt_tRNA")))

 

as I mentioned... the package is still in Bioc-devel, so you'll either have to install the current devel version or wait for the next Bioconductor release (which will be on April 17th). Also, I'm not sure if ALL tRNA genes can be fetched like this, as there seems to be only the gene biotype "Mt_tRNA" defined in Ensembl, so it will only return mitochondrial tRNAs.
 

cheers, jo

ADD COMMENT
0
Entering edit mode

Ah, sorry. Apparently I didn't read the full message first... so it seems my option above would be only something like an alternative to the biomart approach.

To map the features from FDb.UCSC.tRNAs you could try their start, end and seqnames to query biomart if there is a gene matching these. I tried that using the EnsDb.Hsapiens.v75 package but get only a match for 3 out of the 625. I checked some tRNAs manually in the Ensembl web page and apparently they map to introns of (protein coding) genes.

The code I used (might also be possible to do that with biomart)

library('FDb.UCSC.tRNAs')
library(EnsDb.Hsapiens.v75)
ensdb <- EnsDb.Hsapiens.v75

tRNAs <- features(FDb.Hsapiens.UCSC.hg19.tRNAs)
Ensgenes <- character(length(tRNAs))

for(i in 1:length(tRNAs)){
    Gene <- genes(ensdb, filter=list(
                             SeqstartFilter(start(tRNAs)[i], condition=">=",
                                            feature="gene"),
                             SeqendFilter(end(tRNAs)[i], condition="<=",
                                          feature="gene"),
                             SeqnameFilter(sub(as.character(seqnames(tRNAs)[i]),
                                               pattern="chr", replacement="")),
                             SeqstrandFilter(as.character(strand(tRNAs[i])))
        ))
    if(length(Gene)>0){
        Ensgenes[i] <- paste(unique(Gene$gene_id), collapse=";")
    }
}
sum(Ensgenes!="")

 

 

 

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, Johannes!

That is strange about the tRNAs mapping to introns. Did you use ENSEMBL version 75  or 79? In your code above I see both versions. If you used 75, perhaps the coordinates differ since the UCSC is currently on GRCh38 (~> E79)? I'm not sure if the change should be so dramatic though. More likely it is just a lack of understanding of the UCSC table annotations.

Also, I tried installing the database package you put together in R-devel (Bioconductor version 3.1, BiocInstaller 1.17.6, R version 3.3.0), but the 'ensembldb' dependency could not be found. Any suggestions?

ADD REPLY
0
Entering edit mode

I tried now also with the 79 version, but don't find anything there. Actually, the hg19 corresponds to the GRCh37, so, Ensembl 75 was OK. I rather believe that Ensembl does not have the tRNAs defined as "genes".

Regarding the ensembldb package, yes you're right :) there is a problem that the dependency is not (yet) available, but I hope Marc is fixing that soon.

ADD REPLY
0
Entering edit mode
Hi Johannes, Your assumption is correct; we don't annotate tRNAs at genes in Ensembl They are stored in the simple_feature table in the Core MySQL database. As Hans pointed out earlier, they can be accessed via our API. Hope this helps. Cheers, Amonida -- Amonida Zadissa Ensembl Production On 01/04/2015 12:43, johannes.rainer [bioc] wrote: > johannes.rainer posted the Comment: "Getting a list of all tRNA > encoding genes for an organism?": > > I tried now also with the 79 version, but don't find anything there. > Actually, the hg19 corresponds to the GRCh37, so, Ensembl 75 was OK. > I rather believe that Ensembl does not have the tRNAs defined as > "genes". Regarding the ensembldb package, yes you're right :) there > is a problem that the dependency is not (yet) available, but I hope > Marc is fixing that soon. > > --- > See the full post at: C: Getting a list of all tRNA encoding genes for an organism? > Replying to this email will post a comment to the answer above. >
ADD REPLY
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

The coordinates can be retrieved via SQL queries, provided one has a mysql client installed.

library(dplyr)
library(GenomicRanges)

Connect to the data base

db <- src_mysql("homo_sapiens_core_79_38",
    "useastdb.ensembl.org", 3306, "anonymous")

Figure out the analysis id corresponding to the tRNA scan

analysis_id <- tbl(db, "analysis_description") %>%
    filter(display_label=="tRNAs")

Select the simple features corresponding to this analysis

features <- semi_join(tbl(db, "simple_feature"), analysis_id,
                      by="analysis_id")

Get the chromosome name

seq_name <- tbl(db, "seq_region") %>% filter(seq_region_id, name)
features <- inner_join(features, seq_name)

Make into a GRanges

features %>% makeGRangesFromDataFrame(
    keep.extra.columns=TRUE, seqnames.field="name",
    start.field="seq_region_start", end.field="seq_region_end",
    strand.field="seq_region_strand")

dplyr and the published schema made this relatively easy to explore. I think it doesn't get to the original question, which is to annotate these with ENS identifiers, if these actually exist. The SQL server seemed to be quite flaky, with frequent time-outs and erratic performance.

ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States

Hi,

FWIW it doesn't seem that Ensembl uses a consistent approach to tag tRNAs. I guess it depends on the organism. For some organisms (e.g. Fly) it looks like the information is available via the transcript_biotype BioMart attribute. Starting with BioC 3.1, makeTxDbFromBiomart() imports that attribute in the tx_type column. Note that this is a new TxDb column that you can then extract from the TxDb object using the columns arg of the transcripts() extractor.

See for example:

A: Does BSgenome.Dmelanogaster.UCSC.dm2 mountain non-coding RNAs?

for how to use makeTxDbFromBiomart() on Fly and get the tx_type for each transcript (314 tRNAs for Fly).

Unfortunately, as Johannes noticed previously, things don't work so well for Human where only mitochondrial tRNAs (Mt_tRNA) seem to be tagged:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(dataset="hsapiens_gene_ensembl")
tx <- transcripts(txdb, columns=c("tx_name", "gene_id", "tx_type"))
grep("tRNA", unique(mcols(tx)$tx_type), ignore.case=TRUE, value=TRUE)
# [1] "vaultRNA" "Mt_tRNA" 

See:

  A: Feature types in TxDb objects

for more details about the new tx_type feature.

H.

ADD COMMENT

Login before adding your answer.

Traffic: 608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6