I am working with data from ath1121501 (arabidopsis) arrays and I
would
like to do the following:
1. Subset a list of genes based on GO terms. For example, how many
(and which) a given list belong to MF=metabolism.
2. Create a pie chart of the distribution of GO terms for my list.
3. Find statistically over-represented GO terms.
4. Find pathway information for my list.
As simple as goal #1 appears to be, I am not sure how to subset a list
by
GO term. I am not even sure what GO annotation is available for
ath1121501 in BioConductor.
In an attempt to accomplish goal #2, I tried using the function
ontoCompare from the goTools package, but got an error:
>length(DEGList)
[1] 1881
>res<-ontoCompare(DEGList,probeType="ath1121501",plot=TRUE)
[1] "Starting ontoCompare..."
Error in as.vector(x, mode) : invalid argument 'mode'
In an attempt to accomplish goal #3, I tried using the GoHyperG
function
from the GOstats package, but the locus link ID information does not
appear to be available for ath1121501 (this has been addressed in
previous
postings). Are there alternatives that can be used for ath1121501?
I was hoping to use the biomaRt package to get pathway information,
but it
doesn't look like it contains annotation for arabidopsis.
Any suggestions would be sincerely appreciated!
Ann
Ann Hess <hess at="" ...=""> writes:
>
> I am working with data from ath1121501 (arabidopsis) arrays and I
would
> like to do the following:
>
> 1. Subset a list of genes based on GO terms. For example, how many
> (and which) a given list belong to MF=metabolism.
## find the GO Identifier for "metabolism"
library(GO)
myGoTerm <- "metabolism"
myGoID <- unlist(eapply(GOTERM, function(g) if (g at Term == myGoTerm)
TRUE else
FALSE))
myGoID <- names(myGoID[myGoID])
print(myGoID)
## or if you want to find the GO term containing "metabolism"
##x <- eapply(GOTERM, function(g) if (length(grep("metabolism", g at
Term))>0)
cat(g at GOID, " ", g at Term, "\n"))
## get probeset IDs associated with myGoID
library(ath1121501)
myProbeID <- get(myGoID, ath1121501GO2RROBE)
myAllProbeID <- get(myGoID, ath1121501GO2ALLPROBES)
?ath1121501GO2RROBE
?ath1121501GO2ALLPROBES
> 2. Create a pie chart of the distribution of GO terms for my list.
> 3. Find statistically over-represented GO terms.
library(Category)
library(ath1121501)
library(GO)
set.seed(123)
probes <- ls(ath1121501ACCNUM)
probes <- sample(probes, 100)
locusList <- unique(unlist(mget(probes, ath1121501ACCNUM)))
ath1121501LOCUSID <- ath1121501ACCNUM
ans <- geneGoHyperGeoTest(locusList, "ath1121501", "BP")
?geneGoHyperGeoTest
class?GeneGoHyperGeoTestResult
> 4. Find pathway information for my list.
probe-to-AraCyc mapping in ath1121501PATH
probe-to-gene mapping in ath1121501ACCNUM
If you want pathway information from KEGG, use AnnBuilder 1.11.8 to
build your
own ath1121501, and check environment ath1121501PATH and
ath1121501ARACYC.
>
> In an attempt to accomplish goal #3, I tried using the GoHyperG
function
> from the GOstats package, but the locus link ID information does not
> appear to be available for ath1121501 (this has been addressed in
previous
> postings). Are there alternatives that can be used for ath1121501?
>
For Arabidopsis annotation packages, AGI locus identifier is used to
retrieve
annotations for gene, i.e. Entrez Gene ID or GenBank accession are not
used.
Therefore, there is no xxxxLOCUSID environment. xxxxACCNUM gives
probe-to AGI
locus mapping.
hope it helps
nianhua