How to extract only mRNA from TxDb object?
1
1
Entering edit mode
@alexandrgopanenko-11598
Last seen 16 months ago
Germany

Hello everybody! 

 

I want to extract only mRNA from TxdB. So I did the following steps:

txdb <- makeTxDbFromGFF("Homo_sapiens.GRCh38.85.gtf", format="gtf" )
exonsByGene <- exonsBy(txdb, by="gene")

exonsByGene contains all exons including exons related to non-protein coding genes.

The question is how to subset exonsByGene to extract only mRNA (another words to take only exons that corresponds to protein coding genes)?

 

Thank you in advance!

Alex Gopanenko

genomicfeatures • 2.2k views
ADD COMMENT
2
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 5 weeks ago
Italy

Dear Alexandr,

it looks like your GTF is from Ensembl. In this case you might also try to use ensembldb and EnsDb databases instead of TxDb databases. With ensembldb you can then use filters to just extract the data you want. The workflow to generate the database is slightly different:

library(ensembldb)

## Create the database. Note that this creates the 
## SQLite database and does not return an EnsDb.
db <- ensDbFromGtf(gtf = "Homo_sapiens.GRCh38.85.gtf")

## Load this database
edb <- EnsDb(db)

## And use it: we'll use a GenebiotypeFilter to fetch
## only exons of protein coding genes

exonsByGene <- exonsBy(edb, filter = GenebiotypeFilter("protein_coding"), by = "gene")

While this returns all exons for all protein coding genes, it might still contain exons of non-coding transcripts of the protein coding genes, e.g. transcripts that are targeted for nonsense mediated mRNA decay.

cheers, jo

ADD COMMENT
1
Entering edit mode

I think this solution is an understatement. Creating a EnsDB using ensembldb::ensDbFromGtf() is not straight forward and has a lot of contingent programs/software.

While ensembldb::exonsBy(edb, filter = GenebiotypeFilter("protein_coding")) and ensembldb::addFilter(edb, , GeneBiotypeFilter("protein_coding"))) for a EnsDB may be the best way to filter for protein coding transcripts, finding an already created EnsDB using AnnotationHub is the way to go:

library(AnnotationHub)
ah <- AnnotationHub()
ahDb <- query(ah, pattern = c("Homo Sapiens", "EnsDb", "verison_of_your_interest"))
ADD REPLY
0
Entering edit mode

Thank you very much for your advice, it works nice! )

ADD REPLY

Login before adding your answer.

Traffic: 595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6