Entering edit mode
florian.hahne@novartis.com
★
1.6k
@florianhahnenovartiscom-3784
Last seen 6.2 years ago
Switzerland
Hi Marc,
thanks for the hint, but I think this is not quite what I need. My
problem
is still on the level of genomes. UCSC for instance calls a particular
version of they human genome hg19. Now there exists a similar genome
in
Ensembl, however they do not use the same name for it (GRCh37.p10). I
made
the maybe somewhat unwise attempt early on to identify genomes within
Gviz
by their UCSC name and to translate those names into Ensembl names if
necessary. In hind sight this may not have been the smartest decision,
and
I should have left the translation job completely to the user. If
somebody
wants Ensebml gene models from BiomaRt they should make sure that
they
select the correct mart and dataset in the first place.
I'll think about a pragmatic way out of this hole I've dug myself
into.
Florian
--
On 1/23/13 2:18 AM, "Marc Carlson" <mcarlson at="" fhcrc.org=""> wrote:
>Hi Florian,
>
>We actually have a small database called seqnames.db that is
dedicated
>to tracking these kinds of chromosome name conventions. You can see
>more by looking at the help page for supportedSeqnameStyles() (and
it's
>friends). A quick way to see that is:
>
>library(Homo.sapiens)
>?supportedSeqnameStyles
>
>
>If you call the supportedSeqnameStyles() method, you will see that we
>don't (yet) have an entry for zebrafish. If you were to give me one
as a
>tab file, I could add it to the database and it would therefore exist
>for the future... The file I need is deliberately simple to make.
It
>should look like the example below, with as many columns as you want
>there to be styles for, and each column separated by a tab.
>
>NCBI MSU6
>1 1
>2 2
>3 3
>4 4
>
>etc.
>
>
> Marc
>
>
>
>
>
>On 01/21/2013 09:15 AM, Hahne, Florian wrote:
>> Hi Joseph,
>>
>> Regarding your first problem: UCSC has no cytoband information for
any
>>of
>> the zebrafish genomes, and that's what is throwing the error. I
think it
>> should do something smarter, e.g. use the chromosome length
information
>> that should be available for every UCSC genome to draw at least a
blank
>> ideogram which could still be used to indicate the current plotting
>> position. I'll have this ready in the next release of the package,
and
>> maybe even port this back to the current release. It seems to be
more
>>of a
>> bug than a missing feature?
>>
>> Your second problem is a bit more tricky. There is no real mapping
>>between
>> the ensembl genome names used in the Biomart package and the UCSC
ones
>> which I decided to take as the defaults for the package. I tried to
come
>> up with my own static mapping for this, and obviously this means
that
>> things tend to get out of date soon. Now the zebrafish version that
you
>> will get in Ensembl is Zv9 (which is equivalent to danRer7), but my
>> mapping is still to danRer6. This is even wrong, because what you
will
>>get
>> from Biomart if you ask for danRer6 now is actually danRer7. Yikes.
I
>>will
>> have to come up with a better solution for this. There should be a
way
>>to
>> explicitly control for the Ensembl genome that you will get, and
this
>>is a
>> simple change. Getting it right automagically is way more
challenging, I
>> am afraid.
>>
>> As a quick fix for you:
>> Ask for the danRer6 genes and manually change the genome of the
track:
>>
>>biomTrack<-BiomartGeneRegionTrack(genome="danRer6",chromosome=1,star
t=1e6
>>,e
>> nd=1e6+10000,name="ENSEMBL",showId=T)
>> genome(biomTrack)<- "danRer7"
>>
>> I'll get back to you once I have a better solution.
>>
>> Florian
>>
>>
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor