grep GOID, R problem or my problem?
2
0
Entering edit mode
Fangxin Hong ▴ 810
@fangxin-hong-912
Last seen 10.2 years ago
Hi list, Anyone tried to "grep" a target GOID from a list of GOIDs? I found that it made random mistakes, meaning it might grep a GOID which is not the target one by random. For example: ##AffyID.Ath1 is the affyid of all genes on ATH1 array GO.Ath1=mget(AffyID.Ath1,ath1121501GO) GOID.Ath1=sapply(GO.Ath1, function(x) { onts=sapply(x, function(z) z$GOID) }) GOID.finder=sapply(GOID.Ath1,function(x) { grep("GO:0000127",unlist(x),value=TRUE) }) The problem is, even when the nodes of GOID.Ath1 ( a list) doesn't contain the target GOID, the grep will return a hit sometimes. The frequency is very low, but it is there, and the returned hit are changing everytime I re-run it, it looks like grep is making mistakes randomly. I am listing the deatils below, I am sorry if the question is too long and I appreciate your help. I am using R2.1.0patched, and I am working wiht affy ATH1 array. ########## library(ath1121501) library(GO) library(annotate) ####### #Get all GOID whose GO term contains a word "transcription" ########### ##function GOTerm2Tag=function(term) { GTL=eapply(GOTERM,function(x) { grep(term,x at Term,value=TRUE) }) Glen=sapply(GTL,length) names(GTL[Glen>0]) } ##usage GO.trans=GOTerm2Tag("transcription") ################ #Get GOID for all genes on ATH1 array ############# ##AffID.ATH1 is all affyID for all genes on ATH1 array GO.Ath1=mget(AffyID.Ath1,ath1121501GO) GOID.Ath1=sapply(GO.Ath1, function(x) { onts=sapply(x, function(z) z$GOID) }) ############ #Find all genes that contain GOID whose GO term contians "transcrption" ############### ##function: Search for genes that contain the target GOID GOID2Gene=function(GOID,GOID.Ath1) { GOID.finder=sapply(GOID.Ath1,function(x) { grep(GOID,unlist(x),value=TRUE) }) GOID.finder.len=sapply(GOID.finder,length) temp=which(GOID.finder.len>0) names(temp)=names(which(GOID.finder.len>0)) temp } #Function: check the target Goterm for a list of genes Gene2GO=function(AffyID.input, AffyID.Ath1, GOID.Ath1,GOID.target) { ##AffyID.input: AffyID of the genes that needs to be checked ##AffyID.Ath1: AffyID of all genes on Ath1 array ##GOID.Ath1: GoID for all genes on Ath1 ##GOID.target: GOID of target category. e.g. GO.trans index.input=pmatch(AffyID.input,AffyID.Ath1,nomatch=NA) ##index of input gene onthe array GOID.input=GOID.Ath1[index.input] ##GOID for each of the input genes GOID.hit=sapply(GOID.input,function(x) { intersect(unlist(x),GOID.target) }) num.hit=sapply(GOID.hit,length) GOinfo.hit=mget(unlist(GOID.hit),GOTERM) Goterm.hit=sapply(GOinfo.hit,function (x) Term(x)) list(num.hit=num.hit,Goterm.hit=Goterm.hit) } ##usage len.trans=length(GO.trans) record=rep(NA,1,len.trans) index.trans.rec=c() for ( i in 1:len.trans) { temp.out=GOID2Gene(GO.trans[i],GOID.Ath1) temp.names=names(temp.out) index.trans.rec=c(index.trans.rec,temp.out) if(length(temp.out)>0){ temp=Gene2GO(temp.names, AffyID.Ath1, GOID.Ath1,GO.trans) if( length(which(temp$num.hit==0))!=0) {record[i]=which(temp$num.hit==0) print(i) } } rm(temp.out,temp.names,temp) } So the last step will always tell that one or two genes that are identified to have GOID for "transcription", actually don't have any GOID for "transcription". And these one or two genes are changing every time I re-run the program, and the number is chaning from 0 to 3. It looks like that it make random mistakes by grepping different wrong thing at different time. I really appreciate if anyone can give me any hint!! Thank you very much. Fangxin -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong at salk.edu (Phone): 858-453-4100 ext 1105
GO affy Category GO affy Category • 1.4k views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
Hi Fangxin, On 9 Nov 2005, fhong at salk.edu wrote: > Hi list, Anyone tried to "grep" a target GOID from a list of GOIDs? > I found that it made random mistakes, meaning it might grep a GOID > which is not the target one by random. The random mistakes part seems... unlikely. I ran the code you pasted, but it didn't demonstrate the issue very clearly. Do you think you can simplify a bit and send a runnable example that will show us what you are seeing? Also, if you have a list of GOIDs, I wonder if match() or %in% might be a better choice than grep. Or perhaps grep's fixed=TRUE option. It seems you don't really want a regular expression, but an exact match. Best, + seth
ADD COMMENT
0
Entering edit mode
Hi Seth, I am very thankful for your reply. I did changed my program a little to avoid grep, and the problem seems being solved and the computational time has been very much shorten. However, I still don't know what is going on with grep, maybe "value=TRUE" is not appropriate here? I attached the sample script ( the version still with "grep"). I run the script in the last block several time, the output of the last command line give me different numbers, sometimes 2490, sometimes 2492? Any idea? I appreciate your help, I am sorry if my script is still too long. Fangxin ######################################################## library(ath1121501) library(GO) library(annotate) GOTerm2Tag=function(term) { GTL=eapply(GOTERM,function(x) { grep(term,x at Term,value=TRUE) }) Glen=sapply(GTL,length) names(GTL[Glen>0]) } GO.trans=GOTerm2Tag("transcription") ##all GOID with "transcription" in its GO term ##AffyID.Ath1 is the affyID of all genes on ATH1 array GO.Ath1=mget(AffyID.Ath1,ath1121501GO) GOID.Ath1=sapply(GO.Ath1, function(x) { onts=sapply(x, function(z) z$GOID) }) GOID2Gene=function(GOID,GOID.Ath1) { GOID.finder=sapply(GOID.Ath1,function(x) { grep(GOID,unlist(x),value=TRUE) }) GOID.finder.len=sapply(GOID.finder,length) #names(which(GOID.finder.len>0)) temp=which(GOID.finder.len>0) ##output index names(temp)=names(which(GOID.finder.len>0)) temp } ################################################### index.out=c() for ( i in GO.trans) { index.out=c(index.out,GOID2Gene(i,GOID.Ath1)) } index.out.unique=unique(index.out) length(index.out.unique) ######################################################### > Hi Fangxin, > > On 9 Nov 2005, fhong at salk.edu wrote: > >> Hi list, Anyone tried to "grep" a target GOID from a list of GOIDs? >> I found that it made random mistakes, meaning it might grep a GOID >> which is not the target one by random. > > The random mistakes part seems... unlikely. I ran the code you > pasted, but it didn't demonstrate the issue very clearly. Do you > think you can simplify a bit and send a runnable example that will > show us what you are seeing? > > Also, if you have a list of GOIDs, I wonder if match() or %in% might be a > better choice than grep. Or perhaps grep's fixed=TRUE option. It > seems you don't really want a regular expression, but an exact match. > > > Best, > > + seth > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -------------------- Fangxin Hong Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong at salk.edu (Phone): 858-453-4100 ext 1105
ADD REPLY
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 10.2 years ago
On 10 Nov 2005, fhong at salk.edu wrote: > Hi Seth, > > I am very thankful for your reply. I did changed my program a little > to avoid grep, and the problem seems being solved and the > computational time has been very much shorten. That's good news! > However, I still don't know what is going on with grep, maybe > "value=TRUE" is not appropriate here? > > I attached the sample script ( the version still with "grep"). I run > the script in the last block several time, the output of the last > command line give me different numbers, sometimes 2490, sometimes > 2492? Any idea? Will have a look. + seth
ADD COMMENT

Login before adding your answer.

Traffic: 977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6