If I got it correctly you want to plot the transcript biotype instead of the transcript name? If yes, you could use the ensembldb
package:
library(ensembldb)
library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75
## Get the GRanges in the correct format for Gviz for a gene, e.g. BCL2L11
gr <- getGeneRegionTrackForGviz(edb, filter = GenenameFilter("BCL2L11"))
> gr
GRanges object with 89 ranges and 6 metadata columns:
seqnames ranges strand | feature gene
<Rle> <IRanges> <Rle> | <character> <character>
[1] 2 [111878491, 111878765] + | utr5 ENSG00000153094
[2] 2 [111881310, 111881322] + | utr5 ENSG00000153094
[3] 2 [111878491, 111878765] + | utr5 ENSG00000153094
... ... ... ... . ... ...
[87] 2 [111881323, 111881446] + | protein_coding ENSG00000153094
[88] 2 [111907621, 111907724] + | protein_coding ENSG00000153094
[89] 2 [111918996, 111919016] + | protein_coding ENSG00000153094
exon exon_rank transcript symbol
<character> <integer> <character> <character>
[1] ENSE00001514631 1 ENST00000308659 BCL2L11
[2] ENSE00001418123 2 ENST00000308659 BCL2L11
[3] ENSE00001514631 1 ENST00000337565 BCL2L11
... ... ... ... ...
[87] ENSE00001357671 1 ENST00000452231 BCL2L11
[88] ENSE00003483971 2 ENST00000452231 BCL2L11
[89] ENSE00003596136 3 ENST00000452231 BCL2L11
-------
seqinfo: 1 sequence from GRCh37 genome
## Now we want to get the transcript biotype for these transcripts; we
## want to get the results as a data.frame instead of a GRanges here
biotypes <- transcripts(edb, filter = TxidFilter(gr$transcript), return.type = "data.frame")
rownames(biotypes) <- biotypes$tx_id
## Replace symbol with biotype
gr$symbol <- biotypes[gr$transcript, "tx_biotype"]
library(Gviz)
plotTracks(GeneRegionTrack(gr, name = "BCL2L11"), transcriptAnnotation = "symbol")
Alternatively, you could plot separate tracks for protein coding and nonsense mediated decay:
prot_cod <- getGeneRegionTrackForGviz(edb, filter = list(GenenameFilter("BCL2L11"), TxbiotypeFilter("protein_coding")))
nonsense <-getGeneRegionTrackForGviz(edb, filter = list(GenenameFilter("BCL2L11"), TxbiotypeFilter("nonsense_mediated_decay")))
plotTracks(list(GeneRegionTrack(prot_cod, name = "protein_coding"), GeneRegionTrack(nonsense, name = "nonsense_mediated_decay")))
Hope that helps,
cheers, jo
Thank you for helping me. Indeed, this method works. But can you explain to me why EnsDb.Mmusculus.v79 for the gene "Rabggtb", I only got 2 isoforms (protein coding). When actually, at the website ensembl there is more than 10 isoforms(non sense mediated decay) and others. It seems to me that EnsDb.Mmusculus.v79 is outdated in relation to the release 85, it is sad that I could not find any EnsDb.Mmusculus.v85 available.
Yes, I've only packaged some releases from Ensembl for some species. But it's easy to build newer EnsDb packages on your own:
So, that way you get all of your transcripts. I suggest also that you build a
EnsDb
package from the generated SQLite file using themakeEnsembldbPackage
. That way you can always load your annotation as an R package. See the vignette from theensembldb
package for more information.cheers, jo
Thank you very much!
More one thing, I want to plot only sections of a gene. I got this problem earlier with ensembl genes, when I also was trying to plot the names here but gviz only accept if the whole feature is showing up. Can this be done?
That I don't know, but I suggest you post this as a new question.
Sure, you can control that via the just.group parameter:
plotTracks(GeneRegionTrack(gr, name = "BCL2L11"), transcriptAnnotation = "symbol", from=111881310, just.group = "above")
thank you florian!