The EnsDb.Mmusculus.v79
package is mm38, not mm39. If you want a current EnsDb
, use AnnotationHub
> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2024-10-28
> z <- query(hub, c("mus musculus","ensdb"))
> z
AnnotationHub with 87 records
# snapshotDate(): 2024-10-28
# $dataprovider: Ensembl
# $species: Mus musculus, Mus musculus musculus, Mus musculus domesticus, Mus musculus castaneus
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer, rdatadateadded,
# preparerclass, tags, rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53222"]]'
title
AH53222 | Ensembl 87 EnsDb for Mus Musculus
AH53726 | Ensembl 88 EnsDb for Mus Musculus
AH56691 | Ensembl 89 EnsDb for Mus Musculus
AH57770 | Ensembl 90 EnsDb for Mus Musculus
AH60788 | Ensembl 91 EnsDb for Mus Musculus
... ...
AH116906 | Ensembl 112 EnsDb for Mus musculus
AH116907 | Ensembl 112 EnsDb for Mus musculus musculus
AH116908 | Ensembl 112 EnsDb for Mus musculus domesticus
AH116909 | Ensembl 112 EnsDb for Mus musculus
AH119358 | Ensembl 113 EnsDb for Mus musculus
> subset(as(mcols(z), "data.frame")[,c("title","genome")], genome %in% paste0("GRCm", 38:39))
title genome
AH53222 Ensembl 87 EnsDb for Mus Musculus GRCm38
AH53726 Ensembl 88 EnsDb for Mus Musculus GRCm38
AH56691 Ensembl 89 EnsDb for Mus Musculus GRCm38
AH57770 Ensembl 90 EnsDb for Mus Musculus GRCm38
AH60788 Ensembl 91 EnsDb for Mus Musculus GRCm38
AH60992 Ensembl 92 EnsDb for Mus Musculus GRCm38
AH64461 Ensembl 93 EnsDb for Mus Musculus GRCm38
AH64944 Ensembl 94 EnsDb for Mus musculus GRCm38
AH67971 Ensembl 95 EnsDb for Mus musculus GRCm38
AH69210 Ensembl 96 EnsDb for Mus musculus GRCm38
AH73905 Ensembl 97 EnsDb for Mus musculus GRCm38
AH75036 Ensembl 98 EnsDb for Mus musculus GRCm38
AH78811 Ensembl 99 EnsDb for Mus musculus GRCm38
AH79718 Ensembl 100 EnsDb for Mus musculus GRCm38
AH83247 Ensembl 101 EnsDb for Mus musculus GRCm38
AH89211 Ensembl 102 EnsDb for Mus musculus GRCm38
AH89457 Ensembl 103 EnsDb for Mus musculus GRCm39
AH95775 Ensembl 104 EnsDb for Mus musculus GRCm39
AH98078 Ensembl 105 EnsDb for Mus musculus GRCm39
AH100674 Ensembl 106 EnsDb for Mus musculus GRCm39
AH104895 Ensembl 107 EnsDb for Mus musculus GRCm39
AH109367 Ensembl 108 EnsDb for Mus musculus GRCm39
AH109655 Ensembl 109 EnsDb for Mus musculus GRCm39
AH113713 Ensembl 110 EnsDb for Mus musculus GRCm39
AH116340 Ensembl 111 EnsDb for Mus musculus GRCm39
AH116909 Ensembl 112 EnsDb for Mus musculus GRCm39
AH119358 Ensembl 113 EnsDb for Mus musculus GRCm39
You can then use one of those GRCm39 versions by doing ensdb <- hub[["AH119358"]]
, if e.g., you want the most recent one. There are other strain specific ones as well, but for brevity I am just showing the 'regular' ones.
Unfortunately I've updated my code with the newest GRCm39 genome, and yet I am still getting the same error.... Is there anything else I can try to fix this error other than just trying all of the GRCm39 versions one by one?
You will probably have to ask the Signac people. I don't know exactly what is going on under the hood for
AddMotif
, but theEnsDb
seems OK to me.Presumably what they are doing is extracting the sequences, which appears to work (there's nothing off the end of chr1), so I don't think it's an issue with any Bioconductor packages/functions.
This is the traceback I got after running AddMotifs if this is informative?
It's not informative except for me to say I was right that it's
getSeq
that is providing the error. It appears thatAddMotifs
is passing data off tomatchMotifs
in themotifmatchr
package (both not Bioconductor, so we are off topic to a certain extent), and thenmatchMotifs
does stuff and then callsgetSeq
. AndgetSeq
says 'not so fast my friend!'So, things you can do.
1.) set
options(error = recover)
and when it blows up you can check to see what objects are being passed intomatchMotifs
and maybe figure out what the problem is. This works sometimes IMO, but sometimes it's just more confusing. 2.) usetrace(matchMotifs, browser, signature = c(pwms = "PWMatrixList", subject = "GenomicRanges"))
and then runAddMotifs
again. This will drop into a browser if I am correct that you are dispatching on that signature. You can then step through until you get togetSeq
and then inspect what is being used as the genome and the subject. Note that you can use the<<-
function to dump things out to your .GlobalEnv (your workspace), so you could just step through until you get to the relevant step and then dopwms <<- pwms
andsubject <<- subject
, then quit the debugger (usingQ
), and then you can inspect those two objects to figure out what's wrong.And if what's wrong makes sense, and provides you with a way to get things to work, then good on ya! Or maybe that will just give you information that you can provide to either the Sajita or Greenleaf lab if it's an obvious bug. But at this point I think it's up to you or the authors of the code that's not working to figure out.