Hi All,
I have performed a differential expression analysis using DESeq2, and would now like to analyze enriched GO terms using the topGO package. However, I am having difficulty formatting the gene names for input. Here's what I have so far.
## create named vector of p values GO_genes = setNames(res$padj, row.names(res)) ## create a gene selection function to select significant genes sig_genes <- function(pval) {return (pval < 10^-5)} ## create topGO object topGO = new("topGOdata", description="diff expr GO test", ontology= "BP", allGenes = GO_genes, geneSel = sig_genes, nodeSize = 10, annot=annFUN.org, mapping="org.Mm.eg.db", ID = "GeneName")
However, I obtain the following error:
Building most specific GOs ..... ( 0 GO terms found. ) Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) Error in if (is.na(index) || index < 0 || index > length(nd)) stop("vertex is not in graph: ", : missing value where TRUE/FALSE needed
I noticed some other users had this same issue, and am guessing it has something to do with the fact that I'm passing gene names instead of ensembl IDs (although it looks like topGO supports this?). So, I tried this:
## get ensembl IDs for mouse mart = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl") results = getBM(attributes = c("ensembl_gene_id"), values = row.names(res), mart = mart)
This yields a vector of ensembl IDs that is larger than my list of values (row.names(res)), and doesn't provide a mapping from my values to ensembl IDs- so I'm not sure how to pass it to the topGO object, as topGO expects a named vector of p-values.
I know a few other users have asked about this but I haven't been able to come up with a solution- any advice would be much appreciated. Thanks!