Using Ensmbl rel 80, when rel 79 is supported by BioConductor
2
0
Entering edit mode
@anthonycolombo60-8475
Last seen 3.3 years ago
United States

Hi.

 

First thank you for any advice.

 

I am using external software that processes Ensmbl_GRCh38.rel80 for homo sapiens for data processing.

 

When I bring the data into R, the annotation that is available for H.Sapiens is rel79.

 

This is a discrepancy that I wish to clear up.

 

Should I only process data that is relevant to supported annotation libraries from bioConductor?  should I ignore these version differences (I think not) ?

 

Suggestions Welcome


Anthony Colombo

ensembl • 1.2k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 5 weeks ago
Italy

Alternatively, you can use the ensembldb package. Versions 75 and 79 are available through Bioconductor, but it's really simple to generate annotation packages/databases based on any Ensembl version using ensembldb and the AnnotationHub package (check the ensembldb package vignette for alternative options):

library(AnnotationHub)
library(ensembldb)
ah <- AnnotationHub()

## query AnnotationHub for available Ensembl gtf files for Ensembl release 80
query(ah, c("Homo sapiens", "release-80"))

## get the version 80 gtf:
gtf <- ah[["AH47066"]]

## generate the annotation database
DbFile <- ensDbFromGRanges(gtf, organism="Homo_sapiens", version=80, genomeVersion="GRCh38")

## we can either generate a database package using the makeEnsembldbPackage
## , or directly load the data
Edb <- EnsDb(DbFile)

## you can then use e.g. genes to get all annotations from all genes
genes(Edb)
GRanges object with 65217 ranges and 5 metadata columns:
                            seqnames                 ranges strand   |
                               <Rle>              <IRanges>  <Rle>   |
  ENSG00000000003                  X [100627109, 100639991]      -   |
  ENSG00000000005                  X [100584802, 100599885]      +   |
  ENSG00000000419                 20 [ 50934867,  50958555]      -   |
  ENSG00000000457                  1 [169849631, 169894267]      -   |
  ENSG00000000460                  1 [169662007, 169854080]      +   |
              ...                ...                    ...    ... ...
  ENSG00000281918                  1 [113079537, 113079847]      +   |
  ENSG00000281919  CHR_HSCHR5_6_CTG1 [ 33946602,  33956490]      -   |
  ENSG00000281920                  2 [ 65623272,  65628424]      +   |
  ENSG00000281921                  3 [134261776, 134261911]      +   |
  ENSG00000281922 CHR_HSCHR17_1_CTG5 [ 46784842,  46785913]      -   |
                          gene_id     gene_name  entrezid         gene_biotype
                      <character>   <character> <integer>          <character>
  ENSG00000000003 ENSG00000000003        TSPAN6      <NA>       protein_coding
  ENSG00000000005 ENSG00000000005          TNMD      <NA>       protein_coding
  ENSG00000000419 ENSG00000000419          DPM1      <NA>       protein_coding
  ENSG00000000457 ENSG00000000457         SCYL3      <NA>       protein_coding
  ENSG00000000460 ENSG00000000460      C1orf112      <NA>       protein_coding
              ...             ...           ...       ...                  ...
  ENSG00000281918 ENSG00000281918   Metazoa_SRP      <NA>             misc_RNA
  ENSG00000281919 ENSG00000281919       SLC45A2      <NA>       protein_coding
  ENSG00000281920 ENSG00000281920 RP11-418H16.1      <NA>              lincRNA
  ENSG00000281921 ENSG00000281921    AC096967.1      <NA>                miRNA
  ENSG00000281922 ENSG00000281922 RP11-1070B7.2      <NA> processed_pseudogene
                  seq_coord_system
                         <integer>
  ENSG00000000003             <NA>
  ENSG00000000005             <NA>
  ENSG00000000419             <NA>
  ENSG00000000457             <NA>
  ENSG00000000460             <NA>
              ...              ...
  ENSG00000281918             <NA>
  ENSG00000281919             <NA>
  ENSG00000281920             <NA>
  ENSG00000281921             <NA>
  ENSG00000281922             <NA>
  -------
  seqinfo: 312 sequences from GRCh38 genome
## check the vignette of the package for additional infos (e.g. filter the result, get sequences etc)

 

cheers, jo

 

ADD COMMENT
0
Entering edit mode
Diego Diez ▴ 760
@diego-diez-4520
Last seen 4.1 years ago
Japan

A possibility is to annotate with Ensembl 80 using the biomaRt package. Take a look at this relevant recent post: Ensembl release 80 is out!

ADD COMMENT

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6