Hi,
I noticed that Motifdb
is missing a few entries for the FlyFactor Survey
data. I first noticed the issue when comparing the motifs shipped with the MEME Suite vs those inside Motifdb. There are two groups of motifs which are missing: those for the same protein which have alternate entries present in MotifDb (ex br-PLSOLEXA vs br-PLSANGER_5, indeed most of these TFs in this category are missing their SOLEXA entries) or motfs for TFs which are never found in MotifDb, although they have entries in Fly Factor and are assigned to extant Drosophila genes (ex. chinmo).
I generated a .meme format file of the motifs from TFs with 0 entries in Motifdb, and have provided a reproducible script which grabs entries for both types of missing values (found at this gist).
If there is a rationale for why these values should be excluded, I'd be curious to hear it. Otherwise, I'm happy to help curate metadata for these missing entries if necessary to get them included.
All the best, -Spencer
> devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.2 (2019-12-12)
os Debian GNU/Linux 10 (buster)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz Etc/UTC
date 2020-05-25
─ Packages ─────────────────────────────────────────────────────────────────────────
package * version date lib
ape 5.3 2019-03-17 [1]
assertthat 0.2.1 2019-03-21 [1]
backports 1.1.5 2019-10-02 [1]
bibtex 0.4.2.2 2020-01-02 [1]
Biobase 2.46.0 2019-10-29 [2]
BiocGenerics 0.32.0 2019-10-29 [2]
BiocManager 1.30.10 2019-11-16 [2]
BiocParallel 1.20.1 2019-12-21 [2]
Biostrings 2.54.0 2019-10-29 [2]
bitops 1.0-6 2013-08-17 [2]
callr 3.4.0 2019-12-09 [1]
cli 2.0.2 2020-02-28 [1]
colorspace 1.4-1 2019-03-18 [1]
crayon 1.3.4 2017-09-16 [1]
data.table 1.12.8 2019-12-09 [2]
DelayedArray 0.12.2 2020-01-06 [2]
desc 1.2.0 2018-05-01 [2]
devtools 2.2.1 2019-09-24 [2]
digest 0.6.25 2020-02-23 [1]
dotargs 0.0.9000 2020-04-28 [1]
dplyr * 0.8.99.9002 2020-04-30 [1]
dremeR * 0.0.1.9001 2020-05-25 [1]
ellipsis 0.3.0 2019-09-20 [1]
fansi 0.4.1 2020-01-08 [1]
fs 1.3.1 2019-05-06 [1]
gbRd 0.4-11 2012-10-01 [1]
generics 0.0.2 2018-11-29 [1]
GenomeInfoDb 1.22.0 2019-10-29 [2]
GenomeInfoDbData 1.2.2 2020-02-19 [2]
GenomicAlignments 1.22.1 2019-11-12 [2]
GenomicRanges 1.38.0 2019-10-29 [2]
ggplot2 3.2.1 2019-08-10 [1]
ggseqlogo 0.1 2020-03-10 [1]
ggtree 2.0.2 2020-03-16 [1]
glue 1.4.0 2020-04-03 [1]
gtable 0.3.0 2019-03-25 [1]
IRanges 2.20.2 2020-01-13 [2]
jsonlite 1.6 2018-12-07 [1]
lattice 0.20-38 2018-11-04 [3]
lazyeval 0.2.2 2019-03-15 [1]
lifecycle 0.2.0 2020-03-06 [1]
magrittr * 1.5 2014-11-22 [1]
MASS 7.3-51.4 2019-03-31 [3]
Matrix 1.2-18 2019-11-27 [3]
matrixStats 0.55.0 2019-09-07 [2]
memoise 1.1.0 2017-04-21 [2]
MotifDb 1.28.0 2019-10-29 [1]
munsell 0.5.0 2018-06-12 [1]
nlme 3.1-142 2019-11-07 [3]
pillar 1.4.3 2019-12-20 [1]
pkgbuild 1.0.6 2019-10-09 [2]
pkgconfig 2.0.3 2019-09-22 [1]
pkgload 1.0.2 2018-10-29 [2]
prettyunits 1.0.2 2015-07-13 [1]
processx 3.4.2 2020-02-09 [1]
ps 1.3.0 2018-12-21 [1]
purrr 0.3.4 2020-04-17 [1]
R6 2.4.1 2019-11-12 [1]
Rcpp 1.0.3 2019-11-08 [1]
RCurl 1.98-1.1 2020-01-19 [2]
Rdpack 0.11-1 2019-12-14 [1]
remotes 2.1.0 2019-06-24 [2]
rlang 0.4.6 2020-05-02 [1]
rprojroot 1.3-2 2018-01-03 [2]
Rsamtools 2.2.2 2020-02-11 [2]
rstudioapi 0.10 2019-03-19 [1]
rtracklayer 1.46.0 2019-10-29 [2]
rvcheck 0.1.7 2019-11-29 [2]
S4Vectors 0.24.3 2020-01-18 [2]
scales 1.1.0 2019-11-18 [1]
sessioninfo 1.1.1 2018-11-05 [2]
splitstackshape 1.4.8 2019-04-21 [1]
stringi 1.4.3 2019-03-12 [1]
stringr 1.4.0 2019-02-10 [1]
SummarizedExperiment 1.16.1 2019-12-19 [2]
testthat 2.3.1 2019-12-01 [2]
tibble 3.0.1 2020-04-20 [1]
tidyr 1.0.2 2020-01-24 [2]
tidyselect 1.0.0 2020-01-27 [1]
tidytree 0.3.2 2020-03-12 [1]
treeio 1.10.0 2019-10-29 [1]
universalmotif * 1.7.0 2020-04-29 [1]
usethis 1.5.1 2019-07-04 [2]
vctrs 0.2.99.9011 2020-04-30 [1]
withr 2.1.2 2018-03-15 [1]
XML 3.99-0.3 2020-01-20 [2]
XVector 0.26.0 2019-10-29 [2]
yaml 2.2.0 2018-07-25 [1]
zlibbioc 1.32.0 2019-10-29 [2]
source
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
Bioconductor
CRAN (R 3.6.2)
Bioconductor
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
local
Github (tidyverse/dplyr@d353ff1)
local
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
Bioconductor
Bioconductor
Bioconductor
CRAN (R 3.6.2)
Github (omarwagih/ggseqlogo@4adc8f2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
Bioconductor
CRAN (R 3.6.2)
Github (r-lib/vctrs@b11ba67)
CRAN (R 3.6.2)
CRAN (R 3.6.2)
Bioconductor
CRAN (R 3.6.2)
Bioconductor
[1] /nas/longleaf/home/snystrom/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/local/lib/R/library
Hi Spencer,
Thanks for this report, and for offering a fix. Perhaps you could go further? If you clone the repo you will see that it contains
which when loaded, you will see contains matrices and tbl.md. It would be a big help if you could evolve both of these files to add the missing data, along with explanatory notes, then submit a pull request.
Possible?
Hi Paul,
Happy to do that. Is the repo hosted on a publicly available version control page? I can't find it under your github account or the Bioconductor github. I can get the source from the .tar.gz from the Bioconductor page, but it'd obviously be easier using a VCS platform.
-Spencer
Paul might have a more convenient way for posting issues etc, but worst-case is to
and provide a diff or similar (https://stackoverflow.com/a/15438863/547331, but this is just a naive google)
Hi Spencer,
Martin’s first proposal - let’s try that. Sorry I did not anticipate this.
I use (and just updated) an alternate home for MotifDb here:
https://github.com/PriceLab/MotifDb
If you could submit a PR against that repo, I’ll then be sure to echo it up to the bioc master (devel) repo.
Thanks for helping out with this.
Sounds good. I've got a clone of your version working.
By the way, another issue with these entries is that the Flybase gene numbers (FBgn) are out of date. Unfortunately, FBgn's are not permanent identifiers, yet they are used in FlyFactor to reference specific genes which is of course why this issue exists. FlyBase has a nice utility for updating these entries to current values which may help grabbing the ENTREZ ID for genes where it's missing. If you'd like I can do that as well.
Yes, please! Any and all improvements are welcome.
Spencer,
It's been a while, 19 months I see. Any chance you were able to update from FlyFactor, so that I can update MotifDb?
Hi everyone, I am going through a similar problem where the Fly motif I'm interested in is not present in MotifDB however it is available on Flybase and has Flybase id (FBgn....). The Github repo is not updated with flyFactorSurvey.RData. The file is 8 years old. Where can I find the updated version or is there a way to include specific motifs in MotifDB?
Thank you.
Regards,
Gunjan
Gunjan,
I just pinged Specner (aka snystrom) to see if he found the time to do the update from FlyFactor we hope for. Let's see what he says, and then make a plan.
Thank you. I really appreciate the quick response.
We have not heard back from Spencer unfortunately. Gunjan - would you be willing to work with me to update the flybase data in MotifDb?
Sure, I could give it a try. Can you guide me regarding it.