How to plot transcript biotype annotation into GVIZ?
1
0
Entering edit mode
tiago211287 ▴ 10
@tiago211287-9049
Last seen 3.9 years ago
Brazil

I've be trying to visualize my alignment data using gviz and could learn how to plot, using several tutorials around.

However, I could not plot the transcript biotype instead (protein coding/non sense mediated decay/etc) of the name of the gene. 

Is it even possible?

 

Thank you for any guidance.

gviz • 1.9k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 10 weeks ago
Italy

If I got it correctly you want to plot the transcript biotype instead of the transcript name? If yes, you could use the ensembldb package:

library(ensembldb)
library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75

## Get the GRanges in the correct format for Gviz for a gene, e.g. BCL2L11
gr <- getGeneRegionTrackForGviz(edb, filter = GenenameFilter("BCL2L11"))
> gr
GRanges object with 89 ranges and 6 metadata columns:
       seqnames                 ranges strand |        feature            gene
          <Rle>              <IRanges>  <Rle> |    <character>     <character>
   [1]        2 [111878491, 111878765]      + |           utr5 ENSG00000153094
   [2]        2 [111881310, 111881322]      + |           utr5 ENSG00000153094
   [3]        2 [111878491, 111878765]      + |           utr5 ENSG00000153094
   ...      ...                    ...    ... .            ...             ...
  [87]        2 [111881323, 111881446]      + | protein_coding ENSG00000153094
  [88]        2 [111907621, 111907724]      + | protein_coding ENSG00000153094
  [89]        2 [111918996, 111919016]      + | protein_coding ENSG00000153094
                  exon exon_rank      transcript      symbol
           <character> <integer>     <character> <character>
   [1] ENSE00001514631         1 ENST00000308659     BCL2L11
   [2] ENSE00001418123         2 ENST00000308659     BCL2L11
   [3] ENSE00001514631         1 ENST00000337565     BCL2L11
   ...             ...       ...             ...         ...
  [87] ENSE00001357671         1 ENST00000452231     BCL2L11
  [88] ENSE00003483971         2 ENST00000452231     BCL2L11
  [89] ENSE00003596136         3 ENST00000452231     BCL2L11
  -------
  seqinfo: 1 sequence from GRCh37 genome

## Now we want to get the transcript biotype for these transcripts; we
## want to get the results as a data.frame instead of a GRanges here

biotypes <- transcripts(edb, filter = TxidFilter(gr$transcript), return.type = "data.frame")
rownames(biotypes) <- biotypes$tx_id

## Replace symbol with biotype
gr$symbol <- biotypes[gr$transcript, "tx_biotype"]
library(Gviz)

plotTracks(GeneRegionTrack(gr, name = "BCL2L11"), transcriptAnnotation = "symbol")

 

Alternatively, you could plot separate tracks for protein coding and nonsense mediated decay:

prot_cod <- getGeneRegionTrackForGviz(edb, filter = list(GenenameFilter("BCL2L11"), TxbiotypeFilter("protein_coding")))
nonsense <-getGeneRegionTrackForGviz(edb, filter = list(GenenameFilter("BCL2L11"), TxbiotypeFilter("nonsense_mediated_decay")))

plotTracks(list(GeneRegionTrack(prot_cod, name = "protein_coding"), GeneRegionTrack(nonsense, name = "nonsense_mediated_decay")))

 

Hope that helps,

cheers, jo

ADD COMMENT
0
Entering edit mode

Thank you for helping me. Indeed, this method works. But can you explain to me why EnsDb.Mmusculus.v79 for the gene "Rabggtb", I only got 2 isoforms (protein coding). When actually, at the website ensembl there is more than 10 isoforms(non sense mediated decay) and others. It seems to me that EnsDb.Mmusculus.v79 is outdated in relation to the release 85, it is sad that I could not find any EnsDb.Mmusculus.v85 available.

ADD REPLY
1
Entering edit mode

Yes, I've only packaged some releases from Ensembl for some species. But it's easy to build newer EnsDb packages on your own:

> library(ensembldb)
> library(AnnotationHub)
> ah <- AnnotationHub()
> ## Check what's available for Mus musculus and Ensembl release 85:
> query(ah, c("Mus musculus", "release-85", "gtf"))
AnnotationHub with 4 records
# snapshotDate(): 2016-08-15
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH51037"]]'

            title                                          
  AH51037 | Mus_musculus.GRCm38.85.abinitio.gtf            
  AH51038 | Mus_musculus.GRCm38.85.chr.gtf                 
  AH51039 | Mus_musculus.GRCm38.85.chr_patch_hapl_scaff.gtf
  AH51040 | Mus_musculus.GRCm38.85.gtf                     

> ## The last one is the GTF file we're looking for: provides the gene annotations for Ensembl 85
> my_gtf <- ah["AH51040"]
> ## Now build the EnsDb database from this GTF file:
> db_file <- ensDbFromAH(my_gtf)
> ## You can then load and use this database directly
> edb <- EnsDb(db_file)
> transcripts(edb, filter = GenenameFilter("Rabggtb"))
GRanges object with 12 ranges and 5 metadata columns:
                     seqnames                 ranges strand |
                        <Rle>              <IRanges>  <Rle> |
  ENSMUST00000197438        3 [153907287, 153912311]      - |
  ENSMUST00000167111        3 [153907289, 153912145]      - |
  ENSMUST00000089950        3 [153907289, 153913009]      - |
                 ...      ...                    ...    ... .
  ENSMUST00000197829        3 [153908075, 153912033]      - |
  ENSMUST00000198094        3 [153909414, 153912955]      - |
  ENSMUST00000197147        3 [153911986, 153912931]      - |
                                  tx_id              tx_biotype
                            <character>             <character>
  ENSMUST00000197438 ENSMUST00000197438 nonsense_mediated_decay
  ENSMUST00000167111 ENSMUST00000167111          protein_coding
  ENSMUST00000089950 ENSMUST00000089950          protein_coding
                 ...                ...                     ...
  ENSMUST00000197829 ENSMUST00000197829 nonsense_mediated_decay
  ENSMUST00000198094 ENSMUST00000198094 nonsense_mediated_decay
  ENSMUST00000197147 ENSMUST00000197147    processed_transcript
                     tx_cds_seq_start tx_cds_seq_end            gene_id
                            <numeric>      <numeric>        <character>
  ENSMUST00000197438        153910625      153912059 ENSMUSG00000038975
  ENSMUST00000167111        153907601      153912059 ENSMUSG00000038975
  ENSMUST00000089950        153907601      153912936 ENSMUSG00000038975
                 ...              ...            ...                ...
  ENSMUST00000197829        153910072      153912033 ENSMUSG00000038975
  ENSMUST00000198094        153910139      153912955 ENSMUSG00000038975
  ENSMUST00000197147             <NA>           <NA> ENSMUSG00000038975
  -------
  seqinfo: 1 sequence from GRCm38 genome

 

So, that way you get all of your transcripts. I suggest also that you build a EnsDb package from the generated SQLite file using the makeEnsembldbPackage. That way you can always load your annotation as an R package. See the vignette from the ensembldb package for more information.

cheers, jo

ADD REPLY
0
Entering edit mode

Thank you very much!

ADD REPLY
0
Entering edit mode

More one thing, I want to plot only sections of a gene. I got this problem earlier with ensembl genes, when I also was trying to plot the names here but gviz only accept if the whole feature is showing up. Can this be done?

ADD REPLY
0
Entering edit mode

That I don't know, but I suggest you post this as a new question.
 

ADD REPLY
1
Entering edit mode

Sure, you can control that via the just.group parameter:

 plotTracks(GeneRegionTrack(gr, name = "BCL2L11"), transcriptAnnotation = "symbol", from=111881310, just.group = "above")

ADD REPLY
0
Entering edit mode

thank you florian!

ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6