Retrieve aminoacid sequence starting from protein identifier
1
0
Entering edit mode
@giulio-di-giovanni-950
Last seen 10.2 years ago
Hi all, I've looked through the archive, with no result. But I don't know, maybe it's a too easy question.... Anyway, I'd like to know if it exist a command or a package that can help me to retrieve the aminoacid sequence starting from the protein identifier, through a link with Uniprot or similar: Let's say I have : P10451 , Human I would like to obtain: MRIAVICFCL LGITCAIPVK QADSGSSEEK QLYNKYPDAV ATWLNPDPSQ KQNLLAPQNA VSSEETNDFK QETLPSKSNE SHDHMDDMDD EDDDDHVDSQ DSIDSNDSDD VDDTDDSHQS DESHHSDESD ELVTDFPTDL PATEVFTPVV PTVDTYDGRG DSVVYGLRSK SKKFRRPDIQ YPDATDEDIT SHMESEELNG AYKAIPVAQD LNAPSDWDSR GKDSYETSQL DDQSAETHSH KQSRLYKRKA NDESNEHSDV IDSQELSKVS REFHSHEFHS HEDMLVVDPK SKEEDKHLKF RISHELDSAS SEVN Thanks in advance, Giulio. _________________________________________________________________ Naviga più semplice, più veloce e più sicuro. Scarica Internet Explor[[elided Hotmail spam]] http://cid-16be95750dd16d04.skydrive.live.com/self.aspx/le%20PV%20in% 20viaggio!/89.JPG [[alternative HTML version deleted]]
• 815 views
ADD COMMENT
0
Entering edit mode
@john-seers-ifr-1605
Last seen 10.2 years ago
Hi Giulio Have a look at getSEQ and getGI in the annotate package. I think they do what you want. These two lines were lifted out of one of them I think: > accession="P10451" > seq <- readLines(paste("http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi?", "cmd=&txt=on&save=&cfm=&list_uids=", accession, "&", "db=nucleotide&extrafeat=16&term=&view=fasta&", "dispmax=20&SendTo=t&__from=&__to=&__strand=", sep = "")) > sequence<-paste(seq[2:length(seq)], sep = "", collapse = "") > > > seq [1] ">gi|129260|sp|P10451.1|OSTP_HUMAN RecName: Full=Osteopontin; AltName: Full=Bone sialoprotein 1; AltName: Full=Secreted phosphoprotein 1; Short=SPP-1; AltName: Full=Urinary stone protein; AltName: Full=Nephropontin; AltName: Full=Uropontin; Flags: Precursor" [2] "MRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQNAVSSEE TNDFK" [3] "QETLPSKSNESHDHMDDMDDEDDDDHVDSQDSIDSNDSDDVDDTDDSHQSDESHHSDESDELVTD FPTDL" [4] "PATEVFTPVVPTVDTYDGRGDSVVYGLRSKSKKFRRPDIQYPDATDEDITSHMESEELNGAYKAI PVAQD" [5] "LNAPSDWDSRGKDSYETSQLDDQSAETHSHKQSRLYKRKANDESNEHSDVIDSQELSKVSREFHS HEFHS" [6] "HEDMLVVDPKSKEEDKHLKFRISHELDSASSEVN" [7] "" > Regards John -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Giulio Di Giovanni Sent: 24 June 2009 14:43 To: bioconductor at stat.math.ethz.ch Subject: [BioC] Retrieve aminoacid sequence starting from protein identifier Hi all, I've looked through the archive, with no result. But I don't know, maybe it's a too easy question.... Anyway, I'd like to know if it exist a command or a package that can help me to retrieve the aminoacid sequence starting from the protein identifier, through a link with Uniprot or similar: Let's say I have : P10451 , Human I would like to obtain: MRIAVICFCL LGITCAIPVK QADSGSSEEK QLYNKYPDAV ATWLNPDPSQ KQNLLAPQNA VSSEETNDFK QETLPSKSNE SHDHMDDMDD EDDDDHVDSQ DSIDSNDSDD VDDTDDSHQS DESHHSDESD ELVTDFPTDL PATEVFTPVV PTVDTYDGRG DSVVYGLRSK SKKFRRPDIQ YPDATDEDIT SHMESEELNG AYKAIPVAQD LNAPSDWDSR GKDSYETSQL DDQSAETHSH KQSRLYKRKA NDESNEHSDV IDSQELSKVS REFHSHEFHS HEDMLVVDPK SKEEDKHLKF RISHELDSAS SEVN Thanks in advance, Giulio. _________________________________________________________________ Naviga pi? semplice, pi? veloce e pi? sicuro. Scarica Internet Explor[[elided Hotmail spam]] http://cid-16be95750dd16d04.skydrive.li ve.com/self.aspx/le%20PV%20in%20viaggio!/89.JPG [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6