Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Hi all,
I was hoping to perform some ontological analysis using GOstats on a
list of differentially expressed genes; however, I'm not entirely sure
how to proceed.
To provide some background:
- Originally, I was working with data from a two-channel microarray
experiment.
- The data was produced using the ScanArray Express scanner.
- The organism of interest is Campylobacter jejuni; it is exposed to
two conditions (treatment and control).
- I've managed to derive a list of genes identified as differentially
expressed. As a result, I have two .txt files: one containing a column
of the original complete list of probes/genes involved in the
experiment and one containing a column of probes/genes identified as
differentially expressed.
Is it possible to implement GOstats procedures for the above scenario;
the hyperGTest in particular?
I've read the pdf tutorial file located on the bioconductor website (h
ttp://bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/do
c/GOstatsHyperG.pdf), but the document is primarily concerned with
Affymetrix data.
>From what I've gathered, my .txt file containing the original
complete list of probes is analogous to the gene universe data
structure and my .txt file containing the list of probes identified as
differentially expressed is analogous to the selected gene data
structure.
I suppose I'm looking to implement something like the following:
> hgCutoff <- 0.001
> params <- new("GOHyperGParams",
+ geneIds=selectedGene.txt,
+ universeGeneIds=geneUniverse.txt,
+ annotation="hgu95av2.db",
+ ontology="BP",
+ pvalueCutoff=hgCutoff,
+ conditional=FALSE,
+ testDirection="over")
>
>hgOver <- hyperGTest(params)
In particular,
(1) I know I can't use .txt files as suggested in the above code. How
can I convert the selectedGene.txt and geneUniverse.txt into the
appropriate format to be used in the above code?
(2) Currently, the probe names used in my .txt files are simply the
probe (gene) names. Should these gene names be converted to Entrez IDs
or some other format?
(3) Should this file contain the expression values (normalized log2
fold changes)?
(4) In the above code, I have used annotation="hgu95av2.db" (as used
in the tutorial) simply because I'm not sure what this argument
requires. Is this appropriate for the data as described above?
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
>
--
Sent via the guest posting facility at bioconductor.org.