Entering edit mode
BSgenome.Mmusculus.UCSC.mm10 contains mm10 (2012 version). Is mm10.patch 6 - 2017: also available as a BSgenome?
BSgenome.Mmusculus.UCSC.mm10 contains mm10 (2012 version). Is mm10.patch 6 - 2017: also available as a BSgenome?
You could just make your own version.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you James :-).
I could make BSgenome.Mmusculus.UCSC.mm10.p6 and submit it to BioC, but BioC seems to host only the first release of the contemporary major version, am I right?
I wonder why BioC doesn't upgrade the BSgenomes with each new BioC release? Freezing the release guarantees stability, but from the other side the subsequent patches do not alter the genomic coordinates (only a new major version does), and 7 years is a lot... What would you say?
The main reason the BSgenomes don't get updated is lack of personnel to do so. There are maybe 3-4 people who do the bulk of the work for each release, and while some of that involves updating annotation data, probably more involves the logistics of ensuring that thousands of different packages (both analytical and experimental) are all ready to go upon release.
With limited personnel there has to be a hierarchy of necessity, and building BSgenome packages for each successive patch unfortunately comes way down on that hierarchy. Which is why the infrastructure exists to allow people to build their own if they so desire.
That said, there are 819 different TwoBit files on the AnnotationHub for Mus musculus, most of which are Ensembl based. Anything from release 92-97, so far as I know, is p6, so you can always get the TwoBitFile from there, but you probably want the toplevel rather than the primary assembly, so have to choose the strain:
I don't do much with BSgenome packages, so I don't know the fundamental differences, but to my eye, the TwoBitFile is pretty similar.
I'll second James' observations, including a work flow using TwoBit (via AnnotationHub) or even fasta files (managed using BiocFileCache) rather than BSgenome if these resources are sufficient for your research purposes.
Thank you Martin :-)
Thank you James for this extensive reply :-). I was not aware of the presence of these twobit files, so this is definitely good to know!
After looking into the ensembl fasta files, I realized the patches are provided in a separate alternate sequences file, leaving the primary assembly untouched, making the patch level information difficult to use for many applications. With a new Mus musculus major release being planned in the not so distant future, I think I will actually work with the current primary assembly for now, and update to the new major release when available.