Entering edit mode
Julian Gehring
★
1.3k
@julian-gehring-5818
Last seen 6.0 years ago
Hi,
The UCSC hg19 and GRCH 37 reference genome use different reference
sequences for the mitochondrium (MT) that differ in length and have
mismatches at multiple positions. For a short explanation on this,
see
https://lists.soe.ucsc.edu/pipermail/genome/2009-July/019631.html.
Bioconductor normally only provides the UCSC references (see e.g. the
BSgenome.* or TxDb.* packages. When using data aligned to the GRCH
reference, to what extend does using the UCSC reference influence the
analysis of features located on the MT, and for which kinds of
downstream analyses could this become critical? E.g. locating SNPs on
the MT is such a critical case.
The problem will likely be solved with the hg20/GRCH38 release, but
data
that requires the hg19/GRCH37 releases may still be relevant for
several
years. Would it be reasonable to provide alternative reference
packages, such as a GRCH37 BSgenome package?
Best wishes
Julian