Convert Refseq Protein IDs to UniProt IDs
2
0
Entering edit mode
sgupt46 ▴ 10
@sgupt46-13716
Last seen 16 months ago
Canada

I have few RefSeq protein IDs eg. NP_853513.2, NP_000517.2. Is there a to find corresponding UniProt IDs in Bioconductor?

AnnotationDbi • 2.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

Sometimes you can do this using an OrgDb package.

> library(org.Hs.eg.db)
> select(org.Hs.eg.db, c("NP_853513.2", "NP_000517.2"), "UNIPROT", "REFSEQ")
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

## Ugh. Let's strip off the tailing version numbers
> select(org.Hs.eg.db, gsub("\\[1-9]$", "", c("NP_853513.2", "NP_000517.2")), "UNIPROT", "REFSEQ")
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

## still no joy but is that because I'm a dummy? You actually have to strip the period AND the number

> select(org.Hs.eg.db, gsub("\\.[1-9]$", "", c("NP_853513.2", "NP_000517.2")), "UNIPROT", "REFSEQ")
'select()' returned 1:1 mapping between keys and columns
     REFSEQ UNIPROT
1 NP_853513  Q7Z3Y7
2 NP_000517  P02533

So let's try UniProt.ws.

> library(UniProt.ws)
Loading required package: RSQLite
Loading required package: RCurl
Warning messages:
1: package 'UniProt.ws' was built under R version 4.0.3 
2: package 'RSQLite' was built under R version 4.0.3 
3: package 'RCurl' was built under R version 4.0.3 
> up <- UniProt.ws()
> select(up, c("NP_853513.2", "NP_000517.2"), "UNIPROTKB", "REFSEQ_PROTEIN")
Getting mapping data for NP_853513.2 ... and ACC
error while trying to retrieve data in chunk 1:
    no results after 5 attempts; please try again later
continuing to try
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) : 
  attempt to set 'colnames' on an object with less than two dimensions

## Huh. That's a drag

Let's try biomaRt

> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> getBM(c("uniprot_gn_id","uniprotswissprot","refseq_peptide"), "refseq_peptide", c("NP_853513.2", "NP_000517.2"), mart)
[1] uniprot_gn_id    uniprotswissprot refseq_peptide  
<0 rows> (or 0-length row.names)

## Huh. Super annoying. Maybe it's the version numbers? Let's strip those off
> getBM(c("uniprot_gn_id","uniprotswissprot","refseq_peptide"), "refseq_peptide", gsub("\\.[1-9]$", "", c("NP_853513.2", "NP_000517.2")), mart)
  uniprot_gn_id uniprotswissprot refseq_peptide
1        P02533           P02533      NP_000517
2        Q7Z3Y7           Q7Z3Y7      NP_853513

## BOOM! nailed it on the third try...
ADD COMMENT
0
Entering edit mode

See edits. Turns out for these two proteins the OrgDb and biomaRt work. Unfortunately it appears UniProt.ws is having problems...

ADD REPLY
0
Entering edit mode
sgupt46 ▴ 10
@sgupt46-13716
Last seen 16 months ago
Canada

Thanks James. org.Hs.eg.db works.

ADD COMMENT

Login before adding your answer.

Traffic: 674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6