UniProt.ws version 2.10.2
2
0
Entering edit mode
DHS • 0
@dhs-9731
Last seen 8.9 years ago
USA/Stanford

Dear BioConductor Community, I am using the UniProt.ws version 2.10.2 to retrieve features for my protein hitlists. I am particularly interested in the SUBCELLULAR LOCATION information.

While the info is available on uniprot.org (e.g.: http://www.uniprot.org/uniprot/P35579#subcellular_location) I seem to be unable to retrieve any SUBCELLULAR LOCATION information via the UniProt.ws package, I only get NA as results while other cols return the correct infos:

res <- select(up, "P35579", "REACTOME", "UNIPROTKB")
Getting mapping data for P35579 ... and REACTOME_ID
'select()' returned 1:many mapping between keys and columns
> res
  UNIPROTKB      REACTOME
1    P35579 R-HSA-5627117
2    P35579 R-HSA-5625900
3    P35579 R-HSA-5625740
4    P35579 R-HSA-5627123
5    P35579  R-HSA-416572
6    P35579 R-HSA-3928663
7    P35579 R-HSA-2029482

but

res <- select(up, "P35579", "SUBCELLULAR-LOCATIONS", "UNIPROTKB")
Getting extra data for P35579 NA NA etc
'select()' returned 1:1 mapping between keys and columns
> res
  UNIPROTKB SUBCELLULAR-LOCATIONS
1    P35579                  <NA>

So I wonder whether the SUBCELLULAR LOCATION info is actually updated in the package or whether this supporting data can be accessed in any other way?

Best,
D

 

 

 

 

uniprot.ws subcellular location uniprot • 3.3k views
ADD COMMENT
1
Entering edit mode
DHS • 0
@dhs-9731
Last seen 8.9 years ago
USA/Stanford

Dear James,

thanks so much for the info and the code, it works beautifully. Since I am dealing with around 5000 uniprot id's per dataset, is there a way to use that hack and make the solution permanent?

 

ADD COMMENT
0
Entering edit mode

The best way to do that is to get the sources for UniProt.ws and make the changes, then install. That's usually a bit more than most are willing or able to do. The alternative is to wait for the bug to be fixed, and then get the updated package. That should be happening this week, so the easiest thing for you to do would be to wait for the fix to appear.

ADD REPLY
0
Entering edit mode

This has been fixed in devel (2.11.4) and release (2.10.3). Both should be available via biocLite() tomorrow by noon PST. 

Let us know if you run into other problems.

Valerie

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

That's a bug in UniProt.ws. By default it is using this URI:

http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,subcellular locations

when in fact it is supposed to be using this one:

http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,comment(SUBCELLULAR LOCATION)

There are actually several different columns that UniProt.ws won't get, due to malformed URIs. Hopefully this can be resolved by next release.

The list of column names for UniProt can be found here. You could sort of hack your way through to get the correct results, if you are willing to do some work.

> debug(UniProt.ws:::.select)

> select(up, "P35579", "SUBCELLULAR-LOCATIONS", "UNIPROTKB")

Browse[2]> debug(.getSomeUniprotGoodies)

Then hit Enter until you see this:

debug: url <- "http://www.uniprot.org/uniprot/?query="
Browse[3]>
debug: fullUrl <- paste0(url, qstring, "&format=tab&columns=id,", cstring)
Browse[3]>
debug: dat <- .tryReadResult(fullUrl)

And you can see the fullUrl:

Browse[3]> fullUrl
[1] "http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,subcellular locations"

Then fix the URI

Browse[3]> fullUrl <- sub("subcellular locations","comment(SUBCELLULAR LOCATION)", fullUrl)
Browse[3]> fullUrl
[1] "http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,comment(SUBCELLULAR LOCATION)"

Then hit Enter until you get here (like twice, I think).

debug: colnames(dat) <- sub("\\.\\d", "", colnames(dat))
Browse[3]>
debug: dat <- dat[dat[, 1] %in% query, , drop = FALSE]
Browse[3]>  colnames(dat)
[1] "Entry"                     "Subcellular.location..CC."

You have to now fix the column names because the extra ..CC. will mess things up.

Browse[3]> colnames(dat) <- sub("\\.\\.CC\\.", "", colnames(dat))
Browse[3]> colnames(dat)
[1] "Entry"                "Subcellular.location"

Then hit c and Enter twice to get the debugger to just finish out.

Browse[3]> c
exiting from: FUN(qs[[i]], ...)
debug: colnames(dat)[1] <- "ACC+ID"
Browse[2]> c
'select()' returned 1:1 mapping between keys and columns
exiting from: .select(x, keys, columns, keytype)
  UNIPROTKB
1    P35579
                                                                                                                                                                                                                        SUBCELLULAR-LOCATIONS
1 SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton {ECO:0000250}. Cytoplasm, cell cortex {ECO:0000250}. Note=Colocalizes with actin filaments at lamellipodia margins and at the leading edge of migrating cells. {ECO:0000269|PubMed:20052411}.

 

ADD COMMENT

Login before adding your answer.

Traffic: 1077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6