Hi folks using org.Mm.eg.db
I previously did my pseudoalignment with ftp://ftp.ensembl.org/pub/release-99/gtf/musmusculus/Musmusculus.GRCm38.99.gtf.gz
But it seems like org.Mm.eg.db is not updating to that version of Mm's ensembl?
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'MGI'. Please use the keys method to see a listing of valid arguments
How do your IDs look like? Can you provide a head(ens.str)?
Note that if you're using Ensembl annotation you can also use the ensembldbEnsDb annotation resources. These contain all gene, transcript, exon and protein annotations provided from Ensembl. You can get them (for any species and any Ensembl release) from AnnotationHub. To get the one you want (Mus musculus, Ensembl version 99):
> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-10-29
> query(ah, "EnsDb.Mmusculus.v99")
AnnotationHub with 1 record
# snapshotDate(): 2019-10-29
# names(): AH78811
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2019-10-29
# $title: Ensembl 99 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("99", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
# "Protein", "Transcript")
# retrieve record with 'object[["AH78811"]]'
> edb <- ah[["AH78811"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
loading from cache
require(“ensembldb”)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.5
|Creation time: Wed Feb 5 00:04:44 2020
|ensembl_version: 99
|ensembl_host: localhost
|Organism: Mus musculus
|taxonomy_id: 10090
|genome_build: GRCm38
|DBSCHEMAVERSION: 2.1
| No. of genes: 56289.
| No. of transcripts: 144726.
|Protein data available.
You can then use the variable edb then instead of the org.Mm.eg.db. Have also a look at the ensembldb vignettes for more information (browseVignette("ensembldb")).
the keytype specifies the type of your input identifiers (i.e. ens.str). In your case you have to choose keytype = "GENEID" for the EnsDb database as your identifiers are Ensembl gene identifiers. Note that alternatively you could also use the code below to get all gene-related annotations from the EnsDb database. After retrieving you will also have to re-order the data frame to match your input identifiers (last line of the example code below).
ann <- genes(Mm, filter = ~ gene_id %in% ens.str, return.type = "data.frame")
rownames(ann) <- ann$gene_id
ann <- ann[ens.str, ]
Hi Jo, thanks for the tip on building self annotation! when i do head(ens.str)
output is
with "Mm" variable as your "edb"
i still got the same output. I tried ENSEMBL and MGI as my keytype but i get the same output.
is there an example of each of the keys? i tried?TXID but it doesnt show
the
keytype
specifies the type of your input identifiers (i.e.ens.str
). In your case you have to choosekeytype = "GENEID"
for theEnsDb
database as your identifiers are Ensembl gene identifiers. Note that alternatively you could also use the code below to get all gene-related annotations from theEnsDb
database. After retrieving you will also have to re-order the data frame to match your input identifiers (last line of the example code below).Thanks alot for all the tips and advice Jo! :)
Thanks alot for all the tips and advice Jo! :)