Question

Using GOstats with ScanArray Express Data

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Hi all, I was hoping to perform some ontological analysis using GOstats on a list of differentially expressed genes; however, I'm not entirely sure how to proceed. To provide some background: - Originally, I was working with data from a two-channel microarray experiment. - The data was produced using the ScanArray Express scanner. - The organism of interest is Campylobacter jejuni; it is exposed to two conditions (treatment and control). - I've managed to derive a list of genes identified as differentially expressed. As a result, I have two .txt files: one containing a column of the original complete list of probes/genes involved in the experiment and one containing a column of probes/genes identified as differentially expressed. Is it possible to implement GOstats procedures for the above scenario; the hyperGTest in particular? I've read the pdf tutorial file located on the bioconductor website (h ttp://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/do c/GOstatsHyperG.pdf), but the document is primarily concerned with Affymetrix data. >From what I've gathered, my .txt file containing the original complete list of probes is analogous to the gene universe data structure and my .txt file containing the list of probes identified as differentially expressed is analogous to the selected gene data structure. I suppose I'm looking to implement something like the following: > hgCutoff <- 0.001 > params <- new("GOHyperGParams", + geneIds=selectedGene.txt, + universeGeneIds=geneUniverse.txt, + annotation="hgu95av2.db", + ontology="BP", + pvalueCutoff=hgCutoff, + conditional=FALSE, + testDirection="over") > >hgOver <- hyperGTest(params) In particular, (1) I know I can't use .txt files as suggested in the above code. How can I convert the selectedGene.txt and geneUniverse.txt into the appropriate format to be used in the above code? (2) Currently, the probe names used in my .txt files are simply the probe (gene) names. Should these gene names be converted to Entrez IDs or some other format? (3) Should this file contain the expression values (normalized log2 fold changes)? (4) In the above code, I have used annotation="hgu95av2.db" (as used in the tutorial) simply because I'm not sure what this argument requires. Is this appropriate for the data as described above? -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base > -- Sent via the guest posting facility at bioconductor.org.

Organism probe convert GOstats Organism probe convert GOstats • 1.7k views

ADD COMMENT • link updated 11.2 years ago by Steve Lianoglou ★ 13k • written 11.2 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-01-30

Hi, Comments inline: On Thu, Jan 30, 2014 at 2:33 PM, Joseph Shaw [guest] <guest at="" bioconductor.org=""> wrote: > > Hi all, > > I was hoping to perform some ontological analysis using GOstats on a list of differentially expressed genes; however, I'm not entirely sure how to proceed. > > To provide some background: > - Originally, I was working with data from a two-channel microarray experiment. > - The data was produced using the ScanArray Express scanner. > - The organism of interest is Campylobacter jejuni; it is exposed to two conditions (treatment and control). Your first issue is that you will need to compile a list of GO terms per gene for this organism. I believe this is the vignette you will need to give you an idea of what to do -- where the caveat is that you would need to compile these annotations from "somewhere": http://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/d oc/GOstatsForUnsupportedOrganisms.pdf > - I've managed to derive a list of genes identified as differentially expressed. As a result, I have two .txt files: one containing a column of the original complete list of probes/genes involved in the experiment and one containing a column of probes/genes identified as differentially expressed. Good. > Is it possible to implement GOstats procedures for the above scenario; the hyperGTest in particular? Yes. > I suppose I'm looking to implement something like the following: > >> hgCutoff <- 0.001 >> params <- new("GOHyperGParams", > + geneIds=selectedGene.txt, > + universeGeneIds=geneUniverse.txt, > + annotation="hgu95av2.db", > + ontology="BP", > + pvalueCutoff=hgCutoff, > + conditional=FALSE, > + testDirection="over") >> >>hgOver <- hyperGTest(params) > > In particular, > (1) I know I can't use .txt files as suggested in the above code. How can I convert the selectedGene.txt and geneUniverse.txt into the appropriate format to be used in the above code? In principle, you will need to be able to map the gene ID's you are providing as "up" or "down" to the gene IDs used in your GO database. > (2) Currently, the probe names used in my .txt files are simply the probe (gene) names. Should these gene names be converted to Entrez IDs or some other format? This will depend on how you construct your personalized mapping of GO terms to genes for your organism. > (3) Should this file contain the expression values (normalized log2 fold changes)? No, the input to a GO hyperG test are simply the IDs of the genes identified as "interesting" (differentially regulated in one direction, or the other, or both) and the list of gene IDs that consist of your "universe" > (4) In the above code, I have used annotation="hgu95av2.db" (as used in the tutorial) simply because I'm not sure what this argument requires. Is this appropriate for the data as described above? This is a package that provides some annotation for a particular affy chip -- presumably the part of your documentation that you are referencing is providing the list of "interesting" IDs as affy_ids from the chip, and this annotation package has "the goods" to map probe id's to gene (entrez) ids. Apologies for the somehow-vague suggestions. Hopefully it will be helpful in implementing the actually solution. Perhaps those more familiar with the plan can give better specifics, but I've outlined the bare minimum of what you need. Hopefully you will be able to recover the exact particulars from the relevant vignettes. -steve -- Steve Lianoglou Computational Biologist Genentech