Question

incorrect number of dimensions in probeSetSummary function from GOstats

0

Entering edit mode

ramonmassoni ▴ 10

@ramonmassoni-19642

Last seen 8 months ago

Spain

Hi all,

I am running a gene ontology enrichment analysis with the GOstats package as follows:

```{r} params <- new("GOHyperGParams", geneIds = targetentrez, universeGeneIds = universeentrez, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 1, conditional = TRUE, testDirection = "over") hgOvergt <- hyperGTest(params) goresults <- summary(hgOver_gt)


Which yields a nice data frame with the enriched GO terms. Now, I want to know which genes in my target list are found in each enriched term:

```{r}
probeSetSummary(hgOver_gt)

But I get the following error:

{r} Error in `[.default`(tab, , 1) : incorrect number of dimensions

I'm not sure if this is a bug or I am doing something wrong. I would be most grateful if you could shed some light into the matter.

Best

Ramon

GOstats GO R • 1.0k views

ADD COMMENT • link updated 4.7 years ago by James W. MacDonald 68k • written 4.7 years ago by ramonmassoni ▴ 10

score 0 · Answer 1 · 2020-06-10

The probeSetSummary function is intended (as its name sort of implies) to tell you what microarray probes contributed to a particular GO term being significant. You have NCBI Gene IDs, which are easy enough to directly map. Something like this:

## set seed for consistency
> set.seed(0xabeef)
> univ <- keys(org.Hs.eg.db)
> samp <- univ[sample(1:length(univ), 150)]
## no reason to use a p = 1!
> p <- new("GOHyperGParams", geneIds = samp, universeGeneIds = univ, annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 0.05, conditional = TRUE, testDirection = "over")
> hyp <- hyperGTest(p)
> z <- summary(hyp)
> z <- z[z$Size >= 10,] ## Don't trust overly small GO terms (e.g., too few genes in a term)!
> head(z)
       GOBPID      Pvalue OddsRatio   ExpCount Count Size
1  GO:0090398 0.001745174  13.73185 0.23813605     3   78
2  GO:0000002 0.002034070  33.80545 0.06716658     2   22
3  GO:0014850 0.002420820  30.72893 0.07327263     2   24
10 GO:0006829 0.003060633  27.03709 0.08243171     2   27
11 GO:0006882 0.005397670  19.87059 0.10990894     2   36
31 GO:0060999 0.007979571  16.07879 0.13433315     2   44
                                                 Term
1                                 cellular senescence
2                    mitochondrial genome maintenance
3                         response to muscle activity
10                                 zinc ion transport
11                      cellular zinc ion homeostasis
31 positive regulation of dendritic spine development
> dim(z)
[1] 47  7
## map GO IDs to NCBI Gene IDs. Use GOALL, not GO, because we want indirect mappings as well.
> zlst <- mapIds(org.Hs.eg.db, z$GOBPID, "ENTREZID", "GOALL", multiVals = "list")
## convert to a list of data.frames with Gene ID in first column and boolean for significance
> zlst <- lapply(zlst, function(x) data.frame(ENTREZID = x, SIG = x %in% samp))
## or you could just subset
> zlst <- lapply(zlst, function(x) x[x %in% samp])