MedlineR

0

Entering edit mode

David Enot ▴ 30

@david-enot-3064

Last seen 9.4 years ago

France

Dear all, I came across a paper mentionning package called MedlineR. However the original link mentionned in the paper (http://dbsr.duke.edu/pub/MedlineR) does not seem to be working anymore. Because it must been have used by few members of this list, I wonder if someone could point me to an alternative address where I could access this package. Thanks in advance. David ########## David Enot http://sites.google.com/site/enotdavid/ [[alternative HTML version deleted]]

• 1.9k views

ADD COMMENT • link updated 16.4 years ago by Hervé Pagès 16k • written 16.4 years ago by David Enot ▴ 30

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 23 hours ago

Seattle, WA, United States

Hi David, David Enot wrote: > Dear all, > > I came across a paper mentionning package called MedlineR. However the > original link mentionned in the paper (http://dbsr.duke.edu/pub/MedlineR) > does not seem to be working anymore. Because it must been have used by few > members of this list, I wonder if someone could point me to an alternative > address where I could access this package. > Doesn't seem that this package has ever be part of Bioconductor or CRAN. I would suggest that you contact the first author of the paper: Lin SM <lin00025 at="" mc.duke.edu=""> Cheers, H. > Thanks in advance. > > David > > ########## > David Enot > http://sites.google.com/site/enotdavid/ > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.4 years ago Hervé Pagès 16k

0

Entering edit mode

I suspect that the email address in your message will not work, since Simon Lin has since moved from Duke to Northwestern. With the help of google, you can get his current email address: s-lin2 AT northwestern.edu I have no idea if he is still supporting this R package. Best, Kevin Herve Pages wrote: > Hi David, > > David Enot wrote: >> Dear all, >> >> I came across a paper mentionning package called MedlineR. However the >> original link mentionned in the paper (http://dbsr.duke.edu/pub/MedlineR) >> does not seem to be working anymore. Because it must been have used >> by few >> members of this list, I wonder if someone could point me to an >> alternative >> address where I could access this package. >> > > Doesn't seem that this package has ever be part of Bioconductor or CRAN. > I would suggest that you contact the first author of the paper: > > Lin SM <lin00025 at="" mc.duke.edu=""> > > Cheers, > H. > > >> Thanks in advance. >> >> David >> >> ########## >> David Enot >> http://sites.google.com/site/enotdavid/ >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.4 years ago Kevin R. Coombes ▴ 70

0

Entering edit mode

I have been using a hacked copy of the script for a couple of years. It is not a formal packaged, more a script that does what I need. I use it to collect abstracts on genes of interest and used an online database to construct an alias for all rat genes on the Affy chipset I have been using. Let me know exactly what inputs you want and I'll post something that might suit your needs. Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** On Fri, Oct 10, 2008 at 5:25 PM, Kevin R. Coombes <krcoombes@mdacc.tmc.edu>wrote: > I suspect that the email address in your message will not work, since Simon > Lin has since moved from Duke to Northwestern. With the help of google, you > can get his current email address: > s-lin2 AT northwestern.edu > I have no idea if he is still supporting this R package. > > Best, > Kevin > > > Herve Pages wrote: > >> Hi David, >> >> David Enot wrote: >> >>> Dear all, >>> >>> I came across a paper mentionning package called MedlineR. However the >>> original link mentionned in the paper (http://dbsr.duke.edu/pub/MedlineR >>> ) >>> does not seem to be working anymore. Because it must been have used by >>> few >>> members of this list, I wonder if someone could point me to an >>> alternative >>> address where I could access this package. >>> >>> >> Doesn't seem that this package has ever be part of Bioconductor or CRAN. >> I would suggest that you contact the first author of the paper: >> >> Lin SM <lin00025@mc.duke.edu> >> >> Cheers, >> H. >> >> >> Thanks in advance. >>> >>> David >>> >>> ########## >>> David Enot >>> http://sites.google.com/site/enotdavid/ >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 16.4 years ago Mark W Kimpel ▴ 830

0

Entering edit mode

Dear all, My area of research is on metabolomics and my aim is to know if 2 metabolites are associated in the literature. Since, I can retrieve article where these molecules are cited given a set of identified molecules, adding further constraints such as disease or organism of interest. MedlineR seemed to be doing exactly what I was looking for. My code snippet to get the list of PMID: library(XML) query='coombes kr[au] OR kimpel mw[au]' query=gsub('\\s+','+',query) url = " http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=50000 " ## maybe necessary in the present example ;-) url = paste(url, "&db=pubmed&term=", query,sep = "") datafile = tempfile(pattern = "pub") try(download.file(url, destfile = datafile, method = "internal", mode = "wb", quiet = TRUE), silent = TRUE) xml <- xmlTreeParse(datafile, asTree = TRUE) nid = xmlValue(xmlElementsByTagName(xmlRoot(xml), "Count")[[1]]) lid = xmlElementsByTagName(xmlRoot(xml), "IdList", recursive = TRUE)[[1]] unlist(lapply(xmlElementsByTagName(lid, "Id"), xmlValue)) I have not had time to quantify and measure the degree of association... There are several problems associated with querying for molecules and it is probably a clever idea to first generate a small database of abstract for each of my metabolite and starts the data mining from this. Cheers! David 2008/10/11 Mark Kimpel <mwkimpel@gmail.com> > I have been using a hacked copy of the script for a couple of years. It is > not a formal packaged, more a script that does what I need. I use it to > collect abstracts on genes of interest and used an online database to > construct an alias for all rat genes on the Affy chipset I have been using. > Let me know exactly what inputs you want and I'll post something that might > suit your needs. > > Mark > ------------------------------------------------------------ > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > (317) 399-1219 Home > Skype: mkimpel > > "The real problem is not whether machines think but whether men do." -- B. > F. Skinner > ****************************************************************** > > > > On Fri, Oct 10, 2008 at 5:25 PM, Kevin R. Coombes <krcoombes@mdacc.tmc.edu> > wrote: > >> I suspect that the email address in your message will not work, since >> Simon Lin has since moved from Duke to Northwestern. With the help of >> google, you can get his current email address: >> s-lin2 AT northwestern.edu >> I have no idea if he is still supporting this R package. >> >> Best, >> Kevin >> >> >> Herve Pages wrote: >> >>> Hi David, >>> >>> David Enot wrote: >>> >>>> Dear all, >>>> >>>> I came across a paper mentionning package called MedlineR. However the >>>> original link mentionned in the paper ( >>>> http://dbsr.duke.edu/pub/MedlineR) >>>> does not seem to be working anymore. Because it must been have used by >>>> few >>>> members of this list, I wonder if someone could point me to an >>>> alternative >>>> address where I could access this package. >>>> >>>> >>> Doesn't seem that this package has ever be part of Bioconductor or CRAN. >>> I would suggest that you contact the first author of the paper: >>> >>> Lin SM <lin00025@mc.duke.edu> >>> >>> Cheers, >>> H. >>> >>> >>> Thanks in advance. >>>> >>>> David >>>> >>>> ########## >>>> David Enot >>>> http://sites.google.com/site/enotdavid/ >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 16.4 years ago David Enot ▴ 30

0

Entering edit mode

A couple of suggestions. First, instead of performing a single "OR" search, perform two separate searches. The "AND" search is then really easy to compute by looking at how many PMID's show up in both the searches. (This may not matter a lot with just two categories, but it will be much more efficient if you ever switch to more than two things to search for.) Second, there used to be (and perhaps still is) a commercial product called PDQMED from a company called InPharmix that had tools to do this sort of thing, along with statistics to weight the results. One of the more interesting features was the ability to figure out when two items you were searching for were contained in the same (or consecutive) sentences. Best, Kevin David Enot wrote: > > Dear all, > > My area of research is on metabolomics and my aim is to know if 2 > metabolites are associated in the literature. Since, I can retrieve > article where these molecules are cited given a set of identified > molecules, adding further constraints such as disease or organism of > interest. MedlineR seemed to be doing exactly what I was looking for. > > My code snippet to get the list of PMID: > > library(XML) > query='coombes kr[au] OR kimpel mw[au]' > query=gsub('\\s+','+',query) > url = > "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?retmax=50 000" > ## maybe necessary in the present example ;-) > url = paste(url, "&db=pubmed&term=", query,sep = "") > datafile = tempfile(pattern = "pub") > try(download.file(url, destfile = datafile, method = "internal", mode > = "wb", quiet = TRUE), silent = TRUE) > xml <- xmlTreeParse(datafile, asTree = TRUE) > nid = xmlValue(xmlElementsByTagName(xmlRoot(xml), "Count")[[1]]) > lid = xmlElementsByTagName(xmlRoot(xml), "IdList", recursive = TRUE)[[1]] > unlist(lapply(xmlElementsByTagName(lid, "Id"), xmlValue)) > > I have not had time to quantify and measure the degree of association... > There are several problems associated with querying for molecules and > it is probably a clever idea to first generate a small database of > abstract for each of my metabolite and starts the data mining from this. > > Cheers! > > David > > > 2008/10/11 Mark Kimpel <mwkimpel at="" gmail.com="" <mailto:mwkimpel="" at="" gmail.com="">> > > I have been using a hacked copy of the script for a couple of > years. It is not a formal packaged, more a script that does what I > need. I use it to collect abstracts on genes of interest and used > an online database to construct an alias for all rat genes on the > Affy chipset I have been using. Let me know exactly what inputs > you want and I'll post something that might suit your needs. > > Mark > ------------------------------------------------------------ > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > (317) 399-1219 Home > Skype: mkimpel > > "The real problem is not whether machines think but whether men > do." -- B. F. Skinner > ****************************************************************** > > > > On Fri, Oct 10, 2008 at 5:25 PM, Kevin R. Coombes > <krcoombes at="" mdacc.tmc.edu="" <mailto:krcoombes="" at="" mdacc.tmc.edu="">> wrote: > > I suspect that the email address in your message will not > work, since Simon Lin has since moved from Duke to > Northwestern. With the help of google, you can get his current > email address: > s-lin2 AT northwestern.edu <http: northwestern.edu=""> > I have no idea if he is still supporting this R package. > > Best, > Kevin > > > Herve Pages wrote: > > Hi David, > > David Enot wrote: > > Dear all, > > I came across a paper mentionning package called > MedlineR. However the > original link mentionned in the paper > (http://dbsr.duke.edu/pub/MedlineR) > does not seem to be working anymore. Because it must > been have used by few > members of this list, I wonder if someone could point > me to an alternative > address where I could access this package. > > > Doesn't seem that this package has ever be part of > Bioconductor or CRAN. > I would suggest that you contact the first author of the > paper: > > Lin SM <lin00025 at="" mc.duke.edu="" <mailto:lin00025="" at="" mc.duke.edu="">> > > Cheers, > H. > > > Thanks in advance. > > David > > ########## > David Enot > http://sites.google.com/site/enotdavid/ > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >

ADD REPLY • link 16.4 years ago Kevin R. Coombes ▴ 70

0

Entering edit mode

Sorry for my slow response, I was away for the weekend. Below is the MedlineR script I hacked, including. I believe it has the approach suggested by Kevin. You will have to change where the output is stored. If you have a massive number of queries to compare, I have some C code that really speeds things up. Also, Gene Symbols are very ambiguous, thus my efforts to generate all aliases, names, etc. for each gene. I can contribute one way to do this if you need it. Mark # MedlineR: an open source library in R # for Medline literature data mining # result.matrix.generator.func<-function( termSymbs, #a vector of gene symbols to correspond with termList termList, # a list of terms termAdditional="", # additional modifier min.cites) #min.no of cites for co- cite abstract HTML output { require(XML) options("serviceUrl.entrez" = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ ") #create new directory to store results old.wd<-getwd() ## new.wd<-paste(old.wd,"PubMed.analysis", sep="/") ## dir.create(new.wd, showWarnings = FALSE) ##setwd(new.wd) result.matrix<-getAmatrix(termSymbs, termList, termAdditional, min.cites) #save result.matrix for later use local(save(result.matrix, file = "~/R.functions/PubMed_analysis/R_builds/result.matrix.R")) ########################################################### #create a summary report of co-citations for export summ.report<-matrix(nrow=nrow(result.matrix)^2, ncol=2) for (i in 1:(nrow(result.matrix) - 1)) { for (j in i:nrow(result.matrix)) { summ.report[(((i - 1) * nrow(result.matrix)) + j), 1]<-paste(rownames(result.matrix)[i], colnames(result.matrix)[j], sep=":") summ.report[(((i - 1) * nrow(result.matrix)) + j), 2]<-result.matrix[i,j] } } o<-order(as.numeric(summ.report[,2]), decreasing = TRUE) summ.report<-summ.report[o,] summ.report<-summ.report[(as.numeric(summ.report[,2]) !=0), ] write.table(summ.report, file = "~/R.functions/PubMed_analysis/R_builds/co.citaton.summary.report.txt" , row.names = FALSE, sep = "\t", col.names=TRUE, append=FALSE) ###################################################################### ########## setwd(old.wd) result.matrix } ##################################################################### ### # return a co-occurrence matrix for a list of terms getAmatrix<- function ( termSymbs, #a vector of gene symbols to correspond with termList termList, # a list of terms termAdditional="", # additional modifier min.cites #min.no of cites for co- cite abstract HTML output ) { #termList<- c("STE18", "DIG1", "STE12"); #termAdditional<- "+AND+(Saccharomyces+OR+yeast" termSymbs<-unique(termSymbs) termList<-unique(termList) # initialize the co-occurance matrix n.terms<- length (termList) result.matrix<- matrix (0, ncol=n.terms, nrow=n.terms) rownames(result.matrix)<-colnames(result.matrix)<-termSymbs PMID.lst<-list() for (i in 1:n.terms) { PMID.lst[[i]]<-PMID.vec.OneTerm(term=termList[i],termAdditional) pauseBetweenQueries() } #save PMID.lst for later queries local(save(PMID.lst, file = "~/R.functions/PubMed_analysis/R_builds/PMID.lst.R")) # fill in the diagonal first for an optimization # if a term is not found anywhere in medline # it certainly won't match any other terms for (i in 1:n.terms) { try.error<- try (length(PMID.lst[[i]]), TRUE) # if rest of list is empty, exit loop if(inherits(try.error, "try-error")) {break} else {result.matrix[i,i]<-length(PMID.lst[[i]])} } # query the pairs # i.e., the connection of the nodes for (i in 1:(n.terms-1)){ # in the network # check to see if the diagonal for term 1 is 0 if so, don't process it if (result.matrix [i,i]==0) {next} for (j in (i+1):n.terms) { if (result.matrix [j,j]==0) {next} # check to see if the diagonal for term 2 is 0 if so, don't process it n.counts <-countApair ( term1.PMID.vec=PMID.lst[[i]], term2.PMID.vec=PMID.lst[[j]]) result.matrix[i,j]<- n.counts # the matrix is symetric result.matrix[j,i]<- n.counts # i.e., uppper right v.s. lower left # copy for visualization purposes #impractical when building matrix for all genes. instead, implement this separately # using fetchApair and PMID.lst.R on a select group of genes ## fetchApair (term1=rownames(result.matrix)[i], term2<-rownames(result.matrix)[j], ## term1.PMID.vec=PMID.lst[[i]], term2.PMID.vec=PMID.lst[[j]], min.cites) #fetch abstracts and output as html } print(paste("term", i, sep=":")) } result.matrix<-result.matrix[rowSums(result.matrix)!=0, rowSums(result.matrix)!=0] return (result.matrix) } ###################################################################### ## PMID.vec.OneTerm<- function ( term, # a pair of terms termAdditional="") # additional modifying terms { baseUrl=getOption("serviceUrl.entrez") # URL of the Pubmed service # QC: make sure the baseUrl is all right. if (is.null (baseUrl)) { stop ("Need to define the URL of the Pubmed service!") } # Get the query string ready. This string should be in the # Pubmed syntax. The Pubmed syntax is documented at # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html query<- paste (baseUrl, "esearch.fcgi?", "tool=markwkimpelsoft&", #my tool name "db=pubmed&", # database is Pubmed "retmax=10000&", #max number of IDs to return #"rettype=count&", # simply return the counts "email=mkimpel@iupui.edu&", #my email address "term=", # using these terms term, termAdditional, sep="") result.xml<- try (xmlTreeParse(file=query, isURL=T), TRUE) # in XML # count is parsed from XML if(inherits(result.xml, "try-error")) { Sys.sleep(30) PMID.vec<-PMID.vec.OneTerm(term, termAdditional, baseUrl=getOption("serviceUrl.entrez")) } else{PMID.vec<-na.remove.func(as.numeric(unlist(xmlRoot (result.xml) [["IdList"]])))} return (PMID.vec) } ###################################################################### ## # Count the number of abstracts with both term1 and term2 countApair<- function (term1.PMID.vec, term2.PMID.vec) #PMIDs from a pair of terms { count<-length(intersect(term1.PMID.vec, term2.PMID.vec)) return(count) } ###################################################################### ## # Fetch the abstracts with both term1 and term2 fetchApair<- function (term1, term2, term1.PMID.vec, term2.PMID.vec, min.cites) #PMIDs from a pair of terms { require("annotate") comm.PMID<-intersect(term1.PMID.vec, term2.PMID.vec) L<-min(50, length(comm.PMID)) if (L>=min.cites) { old.wd<-getwd() new.wd<-paste(old.wd, paste("co.cit.anal", term1, term2, sep="."), sep="/") dir.create(new.wd, showWarnings = FALSE) setwd(new.wd) x <- pubmed(comm.PMID[1:L]) a <- xmlRoot(x) numAbst <- length(xmlChildren(a)) arts <- vector("list", length = numAbst) for (i in 1:numAbst) { arts[[i]] <- buildPubMedAbst(a[[i]]) } pmAbst2HTML(arts, filename = paste("co.cit.anal", term1, term2, "html", sep="."), frames = TRUE) setwd(old.wd) } } ###################################################################### ## # pause between queries, according to the NCBI rule # ###################################################################### ## # Note: # According to the NCBI rule, query larger than 100 requests # should be 'nicely' run off hours and weekends # # The NCBI rule is documented at # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html # It prohibits WebBot abuse of the http service # # Note: How to sleep for 3 seconds in R? # Sys.sleep (time=3) # pauseBetweenQueries<- function ( sleep.peak=0, # pause (in seconds) during peak hours sleep.offpeak=0 # pause (in seconds) during off-peak ) { # sleep.peak<-15; sleep.offpeak<-3 # Date example: # "Thu" "Jan" "15" "16:46:11" "2004" result.date<- unlist (strsplit( date(), split=" ")) hour<- as.numeric(unlist (strsplit (result.date[4], split=':'))[1]) # off peak hours are Sat, Sun or anytime between 9 pm and 5 am if ( (result.date[1]=="Sat") | (result.date[1]=="Sun") | (hour > 18) | (hour<8) ) {off.peak<-T} else {off.peak<-F} # perform the sleep if (off.peak) { Sys.sleep (sleep.offpeak) } else { Sys.sleep (sleep.peak) } } ###################################################################### ############# #rotate co-occurrance matrix so that "image" correctly displays matrix in same format as the original matrix flip.mat.func<-function(mat) { mat<-t(mat) new.mat<-matrix(nrow=nrow(mat), ncol=ncol(mat)) n<-nrow(new.mat) for (i in 1:n) {new.mat[i,]<-mat[(n+1-i),]} new.mat n<-nrow(new.mat); L<-ncol(new.mat) new.new.mat<-matrix(nrow=n, ncol=L) for (i in 1:n) { for (j in 1:L) { new.new.mat[i,j]<-new.mat[(n+1-i), (L+1-j)] } } new.new.mat } ##################################################################### ### # Retrieve a MEDLINE abstract # ###################################################################### ## # # Example: # fetchAnAbstract (pmID=134567) # fetchAnAbstract<- function ( pmID, # PubMed ID baseUrl=getOption("serviceUrl.entrez") # URL of the Pubmed service ){ # pmID<- 12405121 # QC: make sure the baseUrl is all right. if (is.null (baseUrl)) { stop ("Need to define the URL of the Pubmed service!") } # Get the query string ready. This string should be in the Pubmed # syntax. The Pubmed syntax is documented at # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html query<- paste (baseUrl, "efetch.fcgi?", "db=pubmed&id=", pmID, "&retmode=xml", sep="") # parse resulting XML into a tree to be returned to user result.xml<- try (xmlTreeParse(file=query, isURL=T)) return (result.xml) } ##################################################################### ### # query NCBI Pubmed to get the pmid (PubMed identifier) # ###################################################################### ## # # Example: # queryPubmed (term="cancer+OR+diabetes") # queryPubmed <- function ( term, # query term baseUrl=getOption("serviceUrl.entrez") # URL of the Pubmed service ){ # QC: make sure the baseUrl is all right. if (is.null (baseUrl)) { stop ("Need to define the URL of the Pubmed service!") } # Get the query string ready. This string should be in the Pubmed # syntax. The Pubmed syntax is documented at # http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html query<- paste (baseUrl, "esearch.fcgi?", "db=pubmed&term=", term, sep="") # parse resulting XML into a tree to be returned to user result.xml<- try (xmlTreeParse(file=query, isURL=T)) return (result.xml) } ##################################################################### ### # encode URI # ###################################################################### ## # Example # encodeUri("sf&df:lk;lo #op") # # encode a phase with strange characters in it # encodeUri<- function (x) { # x<- "" # input x is a PHRASE string # WARNING: x should not be the whole http://someCGI header! x<- as.character(x) # replcae with gsub (derived from grep!) x<- gsub (pattern="%", replacement="%25", x) # must be done first! x<- gsub (pattern="/", replacement="%2F", x) x<- gsub (pattern="\\\\?", replacement="%3F", x) x<- gsub (pattern="#", replacement="%23", x) x<- gsub (pattern="=", replacement="%3D", x) x<- gsub (pattern="&", replacement="%26", x) x<- gsub (pattern=":", replacement="%3A", x) x<- gsub (pattern=";", replacement="%3B", x) x<- gsub (pattern=" ", replacement="%20", x) x<- gsub (pattern="\\\\+", replacement="%2B", x) return (x) } ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail (317) 399-1219 Home Skype: mkimpel "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** [[alternative HTML version deleted]]

ADD REPLY • link 16.4 years ago Mark W Kimpel ▴ 830

Login before adding your answer.