In UniProt, an entry such as Q8N4C6 has Gene: NIN but in the Gene Names field
Name: NIN
Synonyms:KIAA1565
If I select the genes column, I get both names.
> select(up, "Q8N4C6", "GENES", "UNIPROTKB") Getting extra data for Q8N4C6 'select()' returned 1:1 mapping between keys and columns UNIPROTKB GENES 1 Q8N4C6 NIN KIAA1565
How can I get just the representative gene symbol (i.e. Approved Symbol by HGNC) like is shown at the top of the summary web page for Q8N4C6? I'm uncertain if the official symbol is always reported first in the genes column. I suppose that I'm hoping there was a select(up, "Q8N4C6", "GENE", "UNIPROTKB")
command available.
Right now UniProt.ws is querying the REST API using the 'genes' keyword, so the query you end up passing is
https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes
Which will return all HGNC symbols for that protein. Since KIAA1565 got wrapped into NIN back in 2005, you get both, because technically both gene symbols apply. There is however a preferred symbol, which you can get using
https://www.uniprot.org/uniprot/?query=Q8N4C6&format=tab&columns=id,genes(PREFERRED)
Hypothetically we could add this to UniProt.ws, but there are other ways of doing things that are even easier:
Which, all things equal, should be the go-to solution.