Question

topGO gene names/ensembl IDs

0

Entering edit mode

krc3004 ▴ 10

@krc3004-12978

Last seen 6.9 years ago

Hi All,

I have performed a differential expression analysis using DESeq2, and would now like to analyze enriched GO terms using the topGO package. However, I am having difficulty formatting the gene names for input. Here's what I have so far.

## create named vector of p values
GO_genes = setNames(res$padj, row.names(res))

## create a gene selection function to select significant genes
sig_genes <- function(pval) {return (pval < 10^-5)}

## create topGO object
topGO = new("topGOdata", description="diff expr GO test", ontology= "BP",  allGenes = GO_genes, geneSel = sig_genes, nodeSize = 10, annot=annFUN.org, mapping="org.Mm.eg.db", ID = "GeneName")

However, I obtain the following error:

Building most specific GOs .....    ( 0 GO terms found. )

Build GO DAG topology ..........    ( 0 GO terms and 0 relations. )
Error in if (is.na(index) || index < 0 || index > length(nd)) stop("vertex is not in graph: ",  : 
  missing value where TRUE/FALSE needed

I noticed some other users had this same issue, and am guessing it has something to do with the fact that I'm passing gene names instead of ensembl IDs (although it looks like topGO supports this?). So, I tried this:

## get ensembl IDs for mouse
mart = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
results = getBM(attributes = c("ensembl_gene_id"), values = row.names(res), mart = mart)

This yields a vector of ensembl IDs that is larger than my list of values (row.names(res)), and doesn't provide a mapping from my values to ensembl IDs- so I'm not sure how to pass it to the topGO object, as topGO expects a named vector of p-values.

I know a few other users have asked about this but I haven't been able to come up with a solution- any advice would be much appreciated. Thanks!

topgo biomart deseq2 gene ontology • 5.0k views

ADD COMMENT • link updated 8.0 years ago by Lluís Revilla Sancho ▴ 760 • written 8.0 years ago by krc3004 ▴ 10

score 1 · Answer 1 · 2017-05-10

I'm afraid I can't help with the topGO part, but to retain the mapping between your query and the returned values with biomaRt, you can normally list the same variable as both an attribute and a filter.

At the moment it looks like you aren't specifying what variable you want to filter on, so the values you provide are actually just ignored and it returns every ensembl ID in the dataset. You can check the available filters using listFilters(mart).

Assuming you're using mgi_symbol as your filter, you should be able to do something like this to get both it and the ensembl IDs returned

results <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"), 
                 filter = "mgi_symbol",
                 values = c("Cntnap1", "Luzp1"), 
                 mart = mart)

> results
  mgi_symbol    ensembl_gene_id
1    Cntnap1 ENSMUSG00000017167
2      Luzp1 ENSMUSG00000001089

score 1 · Answer 2 · 2017-05-11

1

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 7 weeks ago

European Union

topGO accepts the following names:

c("entrez", "genbank", "alias", "ensembl", "symbol", "genename", "unigene")

So you need to replace the ID = "GeneName" by ID="genename", it has nothing to do with accepting one name of the other but how you pass the argument.

ADD COMMENT • link 8.0 years ago Lluís Revilla Sancho ▴ 760