Extract genes for a GO term in GOstat

0

Entering edit mode

Rohit Farmer ▴ 170

@rohit-farmer-3954

Last seen 10.2 years ago

Hi everyone i did a GO enrichment analysis using GOstat package for around 112 genes and got 80 go terms enriched in BP ontology ... but the results are not showing what are the genes that are associated with a particular GO term ... command used are as followd library("hgu133plus2.db") allg <- get("hgu133plus2ENTREZID") allg <- as.data.frame(unlist(as.list(allg))) entrez.ids <- unique(allg[rownames(dat.s),]) params <- new("GOHyperGParams", geneIds=entrez.ids, annotation=c("hgu133plus2"), ontology="BP", pvalueCutoff=0.05, conditional=FALSE, testDirection="over") resultBP<-hyperGTest(params) please help to find out the genes associated with the go terms Rohit -- Rohit Farmer M.Tech Bioinformatics Department of Computational Biology and Bioinformatics Jacob School of Biengineering and Biotechnology Sam Higginbottom Institute of Agriculture, Technology and Sciences (Formerly known as Allahabad Agricultural Institute - Deemed University) Allahabad, UP, INDIA - 211 007 Ph. No. 9839845093, 9415261403 e-Mail rohit.farmer@gmail.com Blog http://rohitsspace.blogspot.com [[alternative HTML version deleted]]

GO GO • 2.3k views

ADD COMMENT • link updated 14.4 years ago by Chao-Jen Wong ▴ 580 • written 14.4 years ago by Rohit Farmer ▴ 170

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Hi Rohit, Rohit Farmer wrote: > Hi everyone i did a GO enrichment analysis using GOstat package for around > 112 genes and got 80 go terms enriched in BP ontology ... but the results > are not showing what are the genes that are associated with a particular GO > term ... command used are as followd > > library("hgu133plus2.db") > allg <- get("hgu133plus2ENTREZID") > allg <- as.data.frame(unlist(as.list(allg))) > entrez.ids <- unique(allg[rownames(dat.s),]) > > params <- new("GOHyperGParams", geneIds=entrez.ids, > annotation=c("hgu133plus2"), ontology="BP", pvalueCutoff=0.05, > conditional=FALSE, testDirection="over") > resultBP<-hyperGTest(params) probesets <- probeSetSummary(resultBP) See ?probeSetSummary for more info. Best, Jim > > please help to find out the genes associated with the go terms > > Rohit -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.4 years ago James W. MacDonald 67k

0

Entering edit mode

Hi, On Thu, Jun 3, 2010 at 9:03 AM, James W. MacDonald <jmacdon at="" med.umich.edu=""> wrote: > Hi Rohit, > > Rohit Farmer wrote: >> >> Hi everyone i did a GO enrichment analysis using GOstat package for around >> 112 genes and got 80 go terms enriched in BP ontology ... but the results >> are not showing what are the genes that are associated with a particular >> GO >> term ... command used are as followd >> >> library("hgu133plus2.db") >> allg <- get("hgu133plus2ENTREZID") >> allg <- as.data.frame(unlist(as.list(allg))) >> entrez.ids <- unique(allg[rownames(dat.s),]) >> >> params <- new("GOHyperGParams", geneIds=entrez.ids, >> annotation=c("hgu133plus2"), ontology="BP", pvalueCutoff=0.05, >> conditional=FALSE, testDirection="over") >> resultBP<-hyperGTest(params) > > probesets <- probeSetSummary(resultBP) > > See ?probeSetSummary for more info. Also, in some cases `geneIdsByCategory` could be useful as well. You can use it to extract the entrez ids of the genes that you find enriched from your HyperGResult, eg. assuming you found "GO:0010468" being enriched in your test: R> regulate.gene.expression <- geneIdsByCategory(resultBP, 'GO:0010468') will provide you with which genes those are. I thought I'd just mention it, since "knowing is half the battle" ... and it's not mentioned in the "See Also" section of probeSetSummary (shouldn't it be?), so you might not find it straight away. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Dear BioC, Could you please let me know, how can I find out from given list of ProbeIDs: Which Probe's location is close to 3 Prime END of DNA? How its different....if location is mapped to POSITIVE or Negative Strand? Thank you so much in advance, Saurin

ADD REPLY • link 14.4 years ago SAURIN ★ 1.1k

0

Entering edit mode

Check out the SpliceCenter web site at http://www.tigerteamconsulting.com/SpliceCenter/SpliceOverview.jsp and try their ArrayCheck tool. Saurin D. Jani wrote: > Dear BioC, > > Could you please let me know, how can I find out from given list of ProbeIDs: > > Which Probe's location is close to 3 Prime END of DNA? > > How its different....if location is mapped to POSITIVE or Negative Strand? > > Thank you so much in advance, > Saurin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.4 years ago Kevin Coombes ▴ 430

0

Entering edit mode

Hi, >Could you please let me know, how can I find out from given list of ProbeIDs: > > Which Probe's location is close to 3 Prime END of DNA? If you don't mind doing a little programming, you could also: 1. get the probe sequences for your array (there are bioconductor packages for these too) 2. realign them 3. check where the land on your "genes" by getting familiar with the GenomicFeatures package. > How its different....if location is mapped to POSITIVE or Negative Strand? In all likelihood it probably won't matter since I'm pretty sure most array protocols require (at some point) amplification of the material that will be hybridized to the chip, which will lose any strand info of the molecules in your sample. I could be mistaken, though, so you might want to read up on the details of your experiment, or perhaps wait for others to chime in. -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Thanks Steve and Kevin, so, I got coordinates of probe sequence alignments then if strand is (+) then take largest "start" number and if strand is (-) then take smallest start number which gives me closest position of probes on 3 prime end. Saurin --- On Tue, 6/8/10, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > From: Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> > Subject: Re: [BioC] ProbeID: how to find which one is close to 3 prime end of DNA- help, > To: saurin_jani at yahoo.com > Cc: "bioconductor" <bioconductor at="" stat.math.ethz.ch=""> > Date: Tuesday, June 8, 2010, 11:16 AM > Hi, > > >Could you please let me know, how can I find out from > given list of ProbeIDs: > > > > Which Probe's location is close to 3 Prime END of > DNA? > > If you don't mind doing a little programming, you could > also: > > 1. get the probe sequences for your array (there are > bioconductor > packages for these too) > 2. realign them > 3. check where the land on your "genes" by getting familiar > with the > GenomicFeatures package. > > > How its different....if location is mapped to POSITIVE > or Negative Strand? > > In all likelihood it probably won't matter since I'm pretty > sure most > array protocols require (at some point) amplification of > the material > that will be hybridized to the chip, which will lose any > strand info > of the molecules in your sample. > > I could be mistaken, though, so you might want to read up > on the > details of your experiment, or perhaps wait for others to > chime in. > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact >

ADD REPLY • link 14.4 years ago SAURIN ★ 1.1k

0

Entering edit mode

> so, I got coordinates of probe sequence alignments then if strand is (+) then take largest "start" number and if strand is (-) then take smallest start number which gives me closest position of probes on 3 prime end. I'm not sure if you are asking a question, but I can't really make exact sense of what you're saying, since I'm left to guess at a few things? It sounds like you already have probeid <-> gene mappings? And now you want to go gene-by-gene and look at where each probe for that genes aligns to the genome? Then you want to find the 3'-most probe in each such group? You mention "if strand is (+)". Are you talking about the strand the probe aligns to (then what you say is wrnog)? Or the strand of the gene you are currently looking at (then what you say is maybe right)? -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Chao-Jen Wong ▴ 580

@chao-jen-wong-3603

Last seen 9.9 years ago

USA/Seattle/Fred Hutchinson Cancer Rese…

Hi, Rohit, I wrote a short script to extract the gene associated with the over-represented GO terms. Hope this would help. Let me know if it doesn't work. your code is: > params <- new("GOHyperGParams", geneIds=entrez.ids, > annotation=c("hgu133plus2"), ontology="BP", pvalueCutoff=0.05, > conditional=FALSE, testDirection="over") > resultBP<-hyperGTest(params) > > please help to find out the genes associated with the go terms > > Rohit > You can do the following: p <- params origGeneIds <- geneIds(p) selected <- intersect(geneIds(p), universeGeneIds(p)) cat2Entrez <- categoryToEntrezBuilder(p) ## get the gene (Entrez ID) in the category geneInCat <- lapply(as.list(summary(resultBP)[,1]), function(goid) { selected[selected %in% cat2Entrez[[goid]]] } ) ## if you want to convert the Entrez ID to manufacture id x=revmap(as.list(hgu133plus2ENTREZID)) geneInCatName <- lapply(geneInCat, function(geneid) { unlist(lapply(as.list(geneid), function(id) sel[sel %in% x[[id]] ] )) }) names(geneInCatName) <- summary(hgOver$result)[,1] ## return geneInCatName -- Chao-Jen Wong Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., M1-B514 PO Box 19024 Seattle, WA 98109 206.667.4485 cwon2 at fhcrc.org

ADD COMMENT • link 14.4 years ago Chao-Jen Wong ▴ 580

0

Entering edit mode

Oh, never mind. James and Steve have suggested better ways to do it. On 06/03/10 10:17, Chao-Jen Wong wrote: > Hi, Rohit, > > I wrote a short script to extract the gene associated with the > over-represented GO terms. Hope this would help. Let me know if it > doesn't work. > > your code is: > >> params <- new("GOHyperGParams", geneIds=entrez.ids, >> annotation=c("hgu133plus2"), ontology="BP", pvalueCutoff=0.05, >> conditional=FALSE, testDirection="over") >> resultBP<-hyperGTest(params) >> >> please help to find out the genes associated with the go terms >> >> Rohit >> >> > You can do the following: > > p <- params > origGeneIds <- geneIds(p) > selected <- intersect(geneIds(p), universeGeneIds(p)) > cat2Entrez <- categoryToEntrezBuilder(p) > ## get the gene (Entrez ID) in the category > geneInCat <- lapply(as.list(summary(resultBP)[,1]), > function(goid) { > selected[selected %in% cat2Entrez[[goid]]] > } ) > > ## if you want to convert the Entrez ID to manufacture id > x=revmap(as.list(hgu133plus2ENTREZID)) > geneInCatName <- lapply(geneInCat, function(geneid) { > unlist(lapply(as.list(geneid), function(id) > sel[sel %in% x[[id]] ] )) > }) > names(geneInCatName) <- summary(hgOver$result)[,1] > ## return > geneInCatName > > > -- Chao-Jen Wong Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., M1-B514 PO Box 19024 Seattle, WA 98109 206.667.4485 cwon2 at fhcrc.org

ADD REPLY • link 14.4 years ago Chao-Jen Wong ▴ 580

Login before adding your answer.