Generating database with non model species to be use by ReportingTools
1
0
Entering edit mode
cagenet34 ▴ 20
@cagenet34-10910
Last seen 3.3 years ago
Toulouse, France, INRA

Hello,

I tried to annotate my list of differentially expressed genes with ReportingTools.

My problem is that I'm working on non model species (basically ovis aries).

I'm blocked because I don't succeed neither in creating my own annotation.Db for sheep via annotationHub or I failed using ensembldb package.

Here are my script with annotation Hub

library("AnnotationHub")
ah<-AnnotationHub()
query(ah, c("OrgDb", "sheep"))
sheep<-ah[["AH48021"]]
keytypes(sheep)

ensoa<-head(keys(sheep,"ENSEMBL"))

select(sheep, ensoa,c("SYMBOL", "GENENAME"), "ENSEMBL")
DbFile<-ensDbFromAH(sheep)
Error in ensDbFromAH(sheep) : 
  Argument 'ah' has to be a (single) AnnotationHub object.

My script with ensembldb

library(ensembldb)
> fetchTablesFromEnsembl(84, species = "sheep")
Error in fetchTablesFromEnsembl(84, species = "sheep") : 
  Something went wrong! I'm missing some of the txt files the perl script should have generated.
In addition: Warning message:
running command 'perl C:/Users/cagenet/Documents/R/win-library/3.3/ensembldb/perl/get_gene_transcript_exon_tables.pl -s sheep -e 84 -U anonymous -H ensembldb.ensembl.org -p 5306 -P ' had status 127 

 

R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationHub_2.4.2        ensembldb_1.4.6            GenomicFeatures_1.24.2    
 [4] HTSFilter_1.12.0           biomaRt_2.28.0             ReportingTools_2.12.2     
 [7] knitr_1.13                 DESeq2_1.12.3              SummarizedExperiment_1.2.3
[10] GenomicRanges_1.24.1       GenomeInfoDb_1.8.2         AnnotationDbi_1.34.3      
[13] IRanges_2.6.0              S4Vectors_0.10.1           Biobase_2.32.0            
[16] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.0                    edgeR_3.14.0                  splines_3.3.1                
 [4] R.utils_2.3.0                 Formula_1.2-1                 shiny_0.13.2                 
 [7] interactiveDisplayBase_1.10.3 latticeExtra_0.6-28           RBGL_1.48.1                  
[10] BSgenome_1.40.1               Rsamtools_1.24.0              Category_2.38.0              
[13] RSQLite_1.0.0                 lattice_0.20-33               biovizBase_1.20.0            
[16] limma_3.28.6                  chron_2.3-47                  digest_0.6.9                 
[19] RColorBrewer_1.1-2            XVector_0.12.0                colorspace_1.2-6             
[22] ggbio_1.20.1                  R.oo_1.20.0                   httpuv_1.3.3                 
[25] htmltools_0.3.5               Matrix_1.2-6                  plyr_1.8.4                   
[28] OrganismDbi_1.14.1            GSEABase_1.34.0               XML_3.98-1.4                 
[31] genefilter_1.54.2             zlibbioc_1.18.0               xtable_1.8-2                 
[34] GO.db_3.3.0                   scales_0.4.0                  BiocParallel_1.6.2           
[37] annotate_1.50.0               ggplot2_2.1.0                 PFAM.db_3.3.0                
[40] nnet_7.3-12                   mime_0.4                      survival_2.39-4              
[43] magrittr_1.5                  R.methodsS3_1.7.1             GGally_1.1.0                 
[46] hwriter_1.3.2                 foreign_0.8-66                GOstats_2.38.0               
[49] BiocInstaller_1.22.2          graph_1.50.0                  tools_3.3.1                  
[52] data.table_1.9.6              stringr_1.0.0                 munsell_0.4.3                
[55] locfit_1.5-9.1                cluster_2.0.4                 Biostrings_2.40.2            
[58] DESeq_1.24.0                  grid_3.3.1                    RCurl_1.95-4.8               
[61] dichromat_2.0-0               VariantAnnotation_1.18.1      AnnotationForge_1.14.2       
[64] bitops_1.0-6                  gtable_0.2.0                  curl_0.9.7                   
[67] DBI_0.4-1                     reshape_0.8.5                 reshape2_1.4.1               
[70] R6_2.1.2                      GenomicAlignments_1.8.1       gridExtra_2.2.1              
[73] rtracklayer_1.32.0            Hmisc_3.17-4                  stringi_1.1.1                
[76] Rcpp_0.12.5                   geneplotter_1.50.0            rpart_4.1-10                 
[79] acepack_1.3-3.3              
 

>

 

reportingtools annotationdbi annotationhub ensembldb • 2.1k views
ADD COMMENT
0
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 8 weeks ago
Italy

Hi,

you are on the right track! To create the EnsDb database for sheep I would suggest that you use the AnnotationHub approach. The fetchTablesFromEnsembl call requires that you have perl available and the correct Ensembl Perl API installed on your system. So, for starters (and if you don't mind missing the NCBI EntrezGene IDs), it's easier to use the ensDbFromAH approach:

library(ensembldb)
library(AnnotationHub)

ah <- AnnotationHub()
## Query all GTF files from Ensembl for sheep
query(ah, c("ensembl", "ovis", "gtf"))

AnnotationHub with 10 records
# snapshotDate(): 2016-06-06
# $dataprovider: Ensembl
# $species: Ovis aries
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH8773"]]'

            title                     
  AH8773  | Ovis_aries.Oar_v3.1.74.gtf
  AH10704 | Ovis_aries.Oar_v3.1.75.gtf
  AH28626 | Ovis_aries.Oar_v3.1.78.gtf
  AH28694 | Ovis_aries.Oar_v3.1.76.gtf
  AH28763 | Ovis_aries.Oar_v3.1.79.gtf
  AH28832 | Ovis_aries.Oar_v3.1.77.gtf
  AH47086 | Ovis_aries.Oar_v3.1.80.gtf
  AH47983 | Ovis_aries.Oar_v3.1.81.gtf
  AH50328 | Ovis_aries.Oar_v3.1.82.gtf
  AH50397 | Ovis_aries.Oar_v3.1.83.gtf

## So, we're using the Ensembl 83 GTF here:
dbFile <- ensDbFromAH(ah["AH50397"])  ## Note the single [ !

## Now, this is the SQLite file:
dbFile
[1] "./Ovis_aries.Oar_v3.1.83.sqlite"

## To use it:
edb <- EnsDb(dbFile)

## Or alternatively make a package using the makeEnsembldbPackage function.

Now you can use the `EnsDb` object with the `genes` etc methods, or also with the `select` method:

columns(edb)
 [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"    
 [5] "EXONSEQSTART"   "GENEBIOTYPE"    "GENEID"         "GENENAME"      
 [9] "GENESEQEND"     "GENESEQSTART"   "ISCIRCULAR"     "SEQCOORDSYSTEM"
[13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "TXBIOTYPE"     
[17] "TXCDSSEQEND"    "TXCDSSEQSTART"  "TXID"           "TXNAME"        
[21] "TXSEQEND"       "TXSEQSTART"   

## To get the Gene ID and the gene name (symbol):

head(select(edb, columns=c("GENEID", "GENENAME", "GENEBIOTYPE")))
              GENEID GENENAME    GENEBIOTYPE
1 ENSOARG00000000001     <NA>        Mt_tRNA
2 ENSOARG00000000002     <NA>        Mt_rRNA
3 ENSOARG00000000003     <NA>        Mt_tRNA
4 ENSOARG00000000004     <NA>        Mt_rRNA
5 ENSOARG00000000005     <NA>        Mt_tRNA
6 ENSOARG00000000006      ND1 protein_coding

 

hope that helps.

 

jo

ADD COMMENT
0
Entering edit mode

ok Thank you. I'm newbie and your advice helps me ;-)

 

ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6