Entering edit mode
Guido Hooiveld
★
4.1k
@guido-hooiveld-2020
Last seen 7 days ago
Wageningen University, Wageningen, the …
Hi,
I have a simple problem that's driving me nuts... Any hints are
appreciated!
I am retrieving the human homologues of rat genes. I use the functions
'getHOMOLOG' and 'listToCharacterVector' from the library
annotationTools. Everything is going fine, except for one thing:
Some rows (genes) contain multiple entries (homologues); for such row
I would like to get rid of all entries except the first one.
Example: for row 18634 I currently have:
[18634] "6173 /// 100529097"
I would like to get rid of everything except the first entry, so to
get this:
[18634] "6173"
How to do this for all relevant rows? Basically, I thus would like to
remove everything positioned after the first number, starting with
space-3xforwardslash-etc.
Thanks,
Guido
library(annotationTools)
library(hugene11stv1hsentrezg.db)
library(ragene11stv1rnentrezg.db)
#Download HomoloGene data from:
#ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/
homologene<-read.delim("homologene.data.121212.data",header=FALSE) #
(date of file manually added to name when saving download)
colnames (homologene) <- c ("HomologyGroupID", "TaxonID", "EgID",
"Symbol", "ProteinGI", "ProteinAcc")
# Read rat probesets that are on the array as Entrez IDs; this returns
a list which is converted to a character vector
# Next the probesets that don't have an EntrezID are removed
rat.eg.array <- mget(ls(ragene11stv1rnentrezgENTREZID),
ragene11stv1rnentrezgENTREZID)
rat.eg.array <- listToCharacterVector(rat.eg.array)
rat.eg.array <- rat.eg.array[!is.na(rat.eg.array)]
# Convert rat EG IDs into human (9606) homologs; this returns a list
which is converted to a character vector
> rat2human <- getHOMOLOG(rat.eg.array,9606,homologene) #this takes
some time
Warning messages:
1: In getHOMOLOG(rat.eg.array, 9606, homologene) :
One or more gene input gene ID/cluster not found in homologue table
2: In getHOMOLOG(rat.eg.array, 9606, homologene) :
One or more gene ID/cluster with no target provided in homologue
table
> rat2human <- listToCharacterVector(rat2human)
> class(rat2human)
[1] "character"
>
> head(rat2human)
[1] "54552" "80212" "11277" "10663" "199692" "399947"
>
> #example of multiple entries
> rat2human[18634]
[1] "6173 /// 100529097"
>
---------------------------------------------------------
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
email: guido.hooiveld@wur.nl
internet: http://nutrigene.4t.com
http://scholar.google.com/citations?user=qFHaMnoAAAAJ
http://www.researcherid.com/rid/F-4912-2010
[[alternative HTML version deleted]]