I am having issues finding the Ensembl Dog10K_Boxer_Tasha dataset. It is available through the [webpage][1], but I am trying to pull down gene names using the web portal BioMart or the R version of biomaRt, with no luck. Has anyone had luck locating this dataset?
Unfortunately, I am looking to go from position to gene name (and ideally Ensembl gene ID)... which means I might need to use another genome browser to go from position to gene name, then user biomart to translate them into the Ensembl IDs... But if you have any other (more streamlined) suggestions, I would love to hear them! Thanks!
You could just make a
TxDb
object and use that.Which includes the positions and the Ensembl ID for that position.
I forgot, you need
library(GenomicFeatures)
first.Oh fantastic! Thank you, this is a wonderful workaround!
@james-w-macdonald-5106, Thank you again for the solution provided previously! Unfortunately, these IDs seem to be unique to the Dog10K_Boxer_Tasha genome... and thus are not compatible with biomart in the end...
So I now wonder your best solution for either 1) converting these IDs to ROS_Cfam_1.0 without full lift over of coordinates. Or 2) easiest method for pulling gene names (not ensembl IDs) from coordinates in R.
Thanks so much for you help!!
Is there a particular reason you are using the boxer genome rather than the C. lupis familiaris genome? It seems that using the 'regular' genome would fix all your issues.
Thank you for your comment! Yes, I think remapping might be helpful. But was hoping to find a solution without needing to re-map. Thanks!
I don't know why Ensembl does that. But they are not the only game in town. You could use the UCSC version, which is based on CanFam6 and uses NCBI IDs, which should be readily converted to symbols if that's what you want.
Thanks, yeah, I am not sure why they do that either... Thanks pondering with me though!