Fetching documents from PubMed

0

Entering edit mode

Kaustubh Patil ▴ 110

@kaustubh-patil-1544

Last seen 10.2 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060208/ 5e3d62cc/attachment.pl

• 915 views

ADD COMMENT • link 18.8 years ago Kaustubh Patil ▴ 110

0

Entering edit mode

rgentleman ★ 5.5k

@rgentleman-7725

Last seen 9.6 years ago

United States

Hi, pubmed makes precisely one request, so there is no issue with timing. In many cases you can make a single request for lots of things, rather than lots of requests for one thing. If you stick it in a for loop then there could be problems, but so far not a single person has reported hitting this particular wall. As for why only 377 came back, did you check to see what happens if you request one of the missing ones by itself? Or go to the website at NLM and see if you Pubmed id is valid? Also, please do read the posting guide and tell us something about your system. thanks Robert Kaustubh Patil wrote: > Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD COMMENT • link 18.8 years ago rgentleman ★ 5.5k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060222/ 425de84a/attachment.pl

ADD REPLY • link 18.8 years ago Kaustubh Patil ▴ 110

0

Entering edit mode

Morten ▴ 300

@morten-929

Last seen 10.2 years ago

Kaustubh Patil wrote: >Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Dont know about the "pubmed" function from annotate, but Ive seen a function which does excaly this in the MedlineR package (im just pasting the code below) pauseBetweenQueries<- function ( sleep.peak=15, # pause (in seconds) during peak hours sleep.offpeak=3 # pause (in seconds) during off-peak ) { # sleep.peak<-15; sleep.offpeak<-3 # Date example: # "Thu" "Jan" "15" "16:46:11" "2004" result.date<- unlist (strsplit( date(), split=" ")) hour<- as.numeric(unlist (strsplit (result.date[4], split=':'))[1]) # off peak hours are Sat, Sun or anytime between 9 pm and 5 am if ( (result.date[1]=="Sat") | (result.date[1]=="Sun") | (hour > 21) | (hour<5) ) {off.peak<-T} else {off.peak<-F} # perform the sleep if (off.peak) { Sys.sleep (sleep.offpeak) } else { Sys.sleep (sleep.peak) } } you may want to try more code from MedlineR. you can find the complete code here: http://www.dbsr.duke.edu/pub/MedlineR/MedlineR_v30.txt hope this can be usefull :) morten > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > >--------------------------------- > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor > >

ADD COMMENT • link 18.8 years ago Morten ▴ 300

0

Entering edit mode

Kaustubh Patil ▴ 110

@kaustubh-patil-1544

Last seen 10.2 years ago

Hi, I forgot to attch the file Its here, Kaustubh Kaustubh Patil <kaustubhp_in at="" yahoo.com=""> wrote: Dear Robert, Thanks for your reply. First of all something about my system, I have celeron 2.5 with 512 mb ram, running fedora core 4 R Version 2.2.1 (2005-12-20 r36812) wilth RSXML 0.99 I am attaching a file that contains 2665 PMIDS that I want to fetch, load this file using load("ids") and it will create a variable with name ids. Then if I use following code, I get only 363 abstracts, docs <- pubmed(ids) root <- xmlRoot(docs) arts <- xmlApply(root,buildPubMedAbst) absts <- sapply(arts,abstText) length(absts) [1] 363 interestingly those are first 363 abstracts. The 364th ("12136003") abstract could be fetched manually as well as using MedlineR library. Am I missing something here? Robert Gentleman <rgentlem at="" fhcrc.org=""> wrote: Hi, pubmed makes precisely one request, so there is no issue with timing. In many cases you can make a single request for lots of things, rather than lots of requests for one thing. If you stick it in a for loop then there could be problems, but so far not a single person has reported hitting this particular wall. As for why only 377 came back, did you check to see what happens if you request one of the missing ones by itself? Or go to the website at NLM and see if you Pubmed id is valid? Also, please do read the posting guide and tell us something about your system. thanks Robert Kaustubh Patil wrote: > Hi, > > I want to fetch documents from PubMed. So first I get all the PMIDs and then use the "pubmed" function from the "annotate package". But does this function take care of the NCBI rule for waiting 3 seconds between queries? > > Also I have a list of 718 PMIDs but the function retrieves only 377 of them? I don't understand why. Suggestions appreciated. > > Thank you and regards, > Kaustubh > > > --------------------------------- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org --------------------------------- ---------------------------------

ADD COMMENT • link 18.8 years ago Kaustubh Patil ▴ 110

Login before adding your answer.