I am trying to use the topGO package but I get this error message "Error in .local(.Object, ...) : allGenes must be a named vector" when I execute the following command.
# Data preparation of reference dataset
selGenes <- genefilter(fitted, filterfun(pOverA(0.20, log2(100)), function(x) (IQR(x) > 0.25)))
eSet <- fitted[selGenes, ]
AllNames <- rownames(eSet)
head(AllNames)
as.factor(AllNames)
## My genes of interest
IntGenes <- read.csv("D1D0_genes2.csv", header = TRUE) # 2-fold or more
## Convert dataframe to matrix with row and column names
IntGenes2 <- IntGenes[,-1]
rownames(IntGenes2) <- IntGenes[,1]
GeNamen <- rownames(IntGenes2)
head(GeNamen)
as.factor(GeNamen)
## Set up connection to ensembl database
ensembl <- useMart(biomart = "plants_mart", dataset = "bnapus_eg_gene",
host = "plants.ensembl.org")
# list the available datasets (species)
listDatasets(ensembl) %>% filter(str_detect(description, "Brassica"))
# specify a data set to use
ensembl = useDataset("bnapus_eg_gene", mart=ensembl)
#Get Ensembl gene IDs and GO terms
GTOGO <- getBM(attributes = c("external_gene_name",
"go_id"),
mart = ensembl)
head (GTOGO)
#Remove blank entries
GTOGO <- GTOGO[GTOGO$go_id != '',]
# convert from table format to list format
geneID2GO <- by(GTOGO$go_id,
GTOGO$external_gene_name,
function(x) as.character(x))
# examine result
head(geneID2GO)
GOdata <- new("topGOdata",
description = "GO analysis of 1 dpi vs mock",
ontology = "BP",
allGenes = AllNames,
geneSel = GeNamen,
annot = geneID2GO,
nodeSize = 5)
I looked at the Ensembl annotations and noticed that the gene names that are commonly used in publications correspond to "external_gene_name" not the "ensembl_gene_id". Is this why it is not working? Do I have to access the "ensembl_gene_id"?
Thank you,
Henrik