annotationTools: character vector clean-up
0
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 7 days ago
Wageningen University, Wageningen, the …
Thanks, that did the trick. Workflow finished as expected. One question, though, to fully understand: what is the exact meaning of .*$ in the argument pattern? I tried to look it up but only found that: " * The preceding item will be matched zero or more times. " Thanks, Guido -----Original Message----- From: Ryan C. Thompson [mailto:rct@thompsonclan.org] Sent: Thursday, February 28, 2013 23:24 To: Hooiveld, Guido Cc: bioconductor at r-project.org Subject: Re: [BioC] annotationTools: character vector clean-up You can try this: library(stringr) x <- str_replace(string=x, pattern=" /// .*$", replacement="") stopifnot(!any(str_detect(x, "///")) You might want to adjust the pattern to allow arbitrary spacing rather than just single spaces. On Thu 28 Feb 2013 02:11:21 PM PST, Hooiveld, Guido wrote: > Hi, > I have a simple problem that's driving me nuts... Any hints are appreciated! > > I am retrieving the human homologues of rat genes. I use the functions 'getHOMOLOG' and 'listToCharacterVector' from the library annotationTools. Everything is going fine, except for one thing: > Some rows (genes) contain multiple entries (homologues); for such row I would like to get rid of all entries except the first one. > Example: for row 18634 I currently have: > [18634] "6173 /// 100529097" > > I would like to get rid of everything except the first entry, so to get this: > [18634] "6173" > > How to do this for all relevant rows? Basically, I thus would like to remove everything positioned after the first number, starting with space-3xforwardslash-etc. > Thanks, > Guido > > > library(annotationTools) > library(hugene11stv1hsentrezg.db) > library(ragene11stv1rnentrezg.db) > > #Download HomoloGene data from: > #ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/ > homologene<-read.delim("homologene.data.121212.data",header=FALSE) # > (date of file manually added to name when saving download) colnames > (homologene) <- c ("HomologyGroupID", "TaxonID", "EgID", "Symbol", > "ProteinGI", "ProteinAcc") > > # Read rat probesets that are on the array as Entrez IDs; this returns > a list which is converted to a character vector # Next the probesets > that don't have an EntrezID are removed rat.eg.array <- > mget(ls(ragene11stv1rnentrezgENTREZID), ragene11stv1rnentrezgENTREZID) > rat.eg.array <- listToCharacterVector(rat.eg.array) > rat.eg.array <- rat.eg.array[!is.na(rat.eg.array)] > > # Convert rat EG IDs into human (9606) homologs; this returns a list > which is converted to a character vector >> rat2human <- getHOMOLOG(rat.eg.array,9606,homologene) #this takes >> some time > Warning messages: > 1: In getHOMOLOG(rat.eg.array, 9606, homologene) : > One or more gene input gene ID/cluster not found in homologue table > 2: In getHOMOLOG(rat.eg.array, 9606, homologene) : > One or more gene ID/cluster with no target provided in homologue > table >> rat2human <- listToCharacterVector(rat2human) >> class(rat2human) > [1] "character" >> >> head(rat2human) > [1] "54552" "80212" "11277" "10663" "199692" "399947" >> >> #example of multiple entries >> rat2human[18634] > [1] "6173 /// 100529097" >> > > > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group Division of Human Nutrition > Wageningen University Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld at wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
convert convert • 642 views
ADD COMMENT

Login before adding your answer.

Traffic: 965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6