Question

Is org.Mm.eg.db updated to mus musculus version 99?

0

Entering edit mode

kavator ▴ 30

@kavator-22955

Last seen 22 months ago

Singapore

Hi folks using org.Mm.eg.db I previously did my pseudoalignment with ftp://ftp.ensembl.org/pub/release-99/gtf/musmusculus/Musmusculus.GRCm38.99.gtf.gz But it seems like org.Mm.eg.db is not updating to that version of Mm's ensembl?

tried both

res$symbol <- mapIds(org.Mm.eg.db,
+                      keys=ens.str,
+                      column="SYMBOL",
+                      keytype="ENSEMBL",
+                      multiVals="first")



    res$symbol <- mapIds(org.Mm.eg.db,
+                      keys=ens.str,
+                      column="SYMBOL",
+                      keytype="MGI",
+                      multiVals="first")

both gave me

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'MGI'. Please use the keys method to see a listing of valid arguments

my countdata bears the ENSMUSG prefixes

org.Mm.eg.db AnnotationDbi • 3.1k views

ADD COMMENT • link 4.6 years ago kavator ▴ 30

score 1 · Answer 1 · 2020-04-27

1

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 29 days ago

Italy

How do your IDs look like? Can you provide a head(ens.str)?

Note that if you're using Ensembl annotation you can also use the ensembldb EnsDb annotation resources. These contain all gene, transcript, exon and protein annotations provided from Ensembl. You can get them (for any species and any Ensembl release) from AnnotationHub. To get the one you want (Mus musculus, Ensembl version 99):

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-10-29
> query(ah, "EnsDb.Mmusculus.v99")
AnnotationHub with 1 record
# snapshotDate(): 2019-10-29 
# names(): AH78811
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2019-10-29
# $title: Ensembl 99 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("99", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
#   "Protein", "Transcript") 
# retrieve record with 'object[["AH78811"]]' 
> edb <- ah[["AH78811"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
require(“ensembldb”)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.5
|Creation time: Wed Feb  5 00:04:44 2020
|ensembl_version: 99
|ensembl_host: localhost
|Organism: Mus musculus
|taxonomy_id: 10090
|genome_build: GRCm38
|DBSCHEMAVERSION: 2.1
| No. of genes: 56289.
| No. of transcripts: 144726.
|Protein data available.

You can then use the variable edb then instead of the org.Mm.eg.db. Have also a look at the ensembldb vignettes for more information (browseVignette("ensembldb")).

cheers, jo

ADD COMMENT • link 4.6 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Hi Jo, thanks for the tip on building self annotation! when i do head(ens.str)

output is

[1] "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" [4] "ENSMUSG00000000049" "ENSMUSG00000000056" "ENSMUSG00000000058"

with "Mm" variable as your "edb"

res$symbol <- mapIds(Mm,keys=ens.str,column="SYMBOL",keytype="MGI",multiVals="first")

i still got the same output. I tried ENSEMBL and MGI as my keytype but i get the same output.

keytypes(Mm) [1] "ENTREZID" "EXONID" "GENEBIOTYPE"
[4] "GENEID" "GENENAME" "PROTDOMID"
[7] "PROTEINDOMAINID" "PROTEINDOMAINSOURCE" "PROTEINID"
[10] "SEQNAME" "SEQSTRAND" "SYMBOL"
[13] "TXBIOTYPE" "TXID" "TXNAME"
[16] "UNIPROTID"

is there an example of each of the keys? i tried?TXID but it doesnt show

ADD REPLY • link 4.6 years ago kavator ▴ 30

0

Entering edit mode

the keytype specifies the type of your input identifiers (i.e. ens.str). In your case you have to choose keytype = "GENEID" for the EnsDb database as your identifiers are Ensembl gene identifiers. Note that alternatively you could also use the code below to get all gene-related annotations from the EnsDb database. After retrieving you will also have to re-order the data frame to match your input identifiers (last line of the example code below).

ann <- genes(Mm, filter = ~ gene_id %in% ens.str, return.type = "data.frame")
rownames(ann) <- ann$gene_id
ann <- ann[ens.str, ]

ADD REPLY • link 4.6 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Thanks alot for all the tips and advice Jo! :)

ADD REPLY • link 4.6 years ago kavator ▴ 30

0

Entering edit mode

Thanks alot for all the tips and advice Jo! :)

ADD REPLY • link 4.6 years ago kavator ▴ 30