Question on GeneSetCollection in GSEABase
1
0
Entering edit mode
siajunren • 0
@siajunren-12197
Last seen 6 months ago
Singapore

Hi,

I have gone through the vignette and parts of the reference manual but I am still stuck.

I have a vector of gene symbols, call this vector “Vec”. These are all the genes which expression levels I have measured with RNA-Seq. Subsequently, I want to perform gene set enrichment analysis with GO biological processes terms using a custom script on the differentially expressed genes and to do so, I need a comprehensive list of gene sets induced from Vec. (i.e. I need a list all the gene sets that each could form a subset of Vec, with each gene set classified according to GO biological processes terms. )

To do so, I run the following line:

Gs=GeneSetCollection(Vec, idType = SymbolIdentifier('org.Mm.eg.db'), setType=GOCollection(ontology='BP'))

But I got the following error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘GeneSetCollection’ for signature ‘"character", "SymbolIdentifier", "GOCollection"’

‘org.Mm.eg.db’ definitely contains mapping from Symbols to entrez gene and vice versa.

Is my approach even somewhat correct?

Thanks,

Junren Sia, PhD

Research Fellow

Institute of Medical Biology

gseabase geneset • 1.7k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

Automated gene set construction is only possible for the 'primary' identifiers (ENTREZ, for the org packages), but they are not too hard to construct 'by hand'. I'll use the following packages

library(org.Hs.eg.db)
library(GSEABase)
library(magrittr)

Here's some data, for reproducibility

set.seed(123)
vec <- sample(keys(org.Hs.eg.db, "SYMBOL"), 1000)

I'll retrieve the GO identifiers associated with each term, then subset to a single ontology and columns that I'm interested in, and remove duplicate (e.g., because of multiple evidence codes, which we are not concerned with) rows

ids <- select(org.Hs.eg.db, vec, "GO", "SYMBOL") %>%
    subset(ONTOLOGY=="BP", c("SYMBOL", "GO")) %>% unique

I'll create the sets by splitting the SYMBOL identifiers based on their GO identifier

sets <- split(ids$SYMBOL, ids$GO)

I'll then map each plain character vector to a GeneSet using Map(), and create a collection of gene sets

gsc <- GeneSetCollection(Map(
    GeneSet, sets, setName=names(sets),
    MoreArgs=list(
        geneIdType=SymbolIdentifier("org.Hs.eg.db"),
        collectionType=GOCollection(ontology="BP"))
))

The result is a collection with 1133 gene sets containing a total of 287 genes.

> gsc
GeneSetCollection
  names: GO:0000082, GO:0000086, ..., GO:2001288 (1133 total)
  unique identifiers: RPA2, PPP6C, ..., GPHA2 (287 total)
  types in collection:
    geneIdType: SymbolIdentifier (1 total)
    collectionType: GOCollection (1 total)

 

ADD COMMENT
0
Entering edit mode

Thank you for the demonstration of the "by hand" method. However, I like to avoid that if possible.

Following your advice that automated gene set construction is only possible for 'primary identifiers', I mapped my symbols to EntrezID and ran the same line but obtained the same error as before. Do you know what went wrong?

ADD REPLY

Login before adding your answer.

Traffic: 759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6