Hello everyone,
I am trying to do gene enrichment analysis using TopGO by giving predefined set of differentially expressed genes. so far my code looks like this:
> gene.go <- read.csv("GO_total_upregulated_genes.csv",header = T, stringsAsFactors = F, sep = ",")
> head(gene.go)
GeneID Goterm
1 FUN_004018 GO:0016491
2 FUN_004018 GO:0046872
3 FUN_004018 GO:0055114
4 FUN_003797 GO:0000287
5 FUN_003797 GO:0030976
6 FUN_003797 GO:0030976
> gene2GO <- tapply(gene.go$Goterm, gene.go$GeneID, function(x)x)
> head(gene2GO)
$`FUN_000109`
[1] "GO:0005515" "GO:0005515" "GO:0005515" "GO:0005515"
$FUN_000399
[1] "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374"
$FUN_000424
[1] "GO:0006950" "GO:0016021"
$FUN_000451
[1] "GO:0035556" "GO:0035556" "GO:0035556" "GO:0035556"
$FUN_000481
[1] "GO:0000166" "GO:0000166" "GO:0016021" "GO:0016021" "GO:0006812" "GO:0016021"
[7] "GO:0019829" "GO:0006812" "GO:0016021" "GO:0019829" "GO:0000166" "GO:0006812"
[13] "GO:0016021" "GO:0019829" "GO:0016021" "GO:0016021" "GO:0000166" "GO:0000166"
[19] "GO:0000166" "GO:0016021" "GO:0016021" "GO:0006812" "GO:0016021" "GO:0019829"
[25] "GO:0006812" "GO:0016021" "GO:0019829" "GO:0000166" "GO:0006812" "GO:0016021"
[31] "GO:0019829" "GO:0016021" "GO:0016021" "GO:0000166" "GO:0006812" "GO:0016021"
[37] "GO:0019829" "GO:0000166" "GO:0006812" "GO:0016021" "GO:0019829" "GO:0016021"
[43] "GO:0016021" "GO:0000166" "GO:0000166" "GO:0000166" "GO:0016021" "GO:0016021"
[49] "GO:0006812" "GO:0016021" "GO:0019829"
$FUN_000534
[1] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[7] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[13] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[19] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
> infile <- "Total_upregulated_genes_pvalue.csv"
> pcutoff <- 0.05 # cutoff for defining significant genes
> DE <- read.delim(infile, stringsAsFactors = F, header = T, sep=",")
> tmp <- (DE$padj < pcutoff)
> geneList <- tmp
> names(geneList) <- (DE$GeneID)
> head(geneList)
FUN_000233 FUN_000410 FUN_000424 FUN_000451 FUN_000511 FUN_000590
TRUE TRUE TRUE TRUE TRUE TRUE
> GOdata_UP_BP <- new("topGOdata",ontology = "BP",allGenes = geneList,
+ geneSelectionFun = function(x)(x == 1),
+ annot = annFUN.gene2GO, gene2GO = gene2GO)
Error in .local(.Object, ...) :
allGenes should be a factor or a numeric vector
>
I think I am doing something wrong while calling allGenes vector with p-values. Could you please help me out.
Thank you!!
James,
I don't think I followed you well, however I tried a couple of things to figure out, but still stuck.
I was able to get 1 for all of my significant p-values but was not able to get the exact p-values. Any suggestions please!!
What does the first line of the code you present do? Why are you doing that? If you want your
geneList
to be p-values why are you not just extracting the p-values?Its assigning all the p-values that are significant (<0.05) as 1 and rest as 0. I was not able to make a code that could return me the actual p-values that's why.
Can I do something like: if the values are less than pcutoff return all scores
OK, let's try this another way.
You need a vector of the actual p-values, and that vector has to have a certain set of names. What you are doing is extracting the actual p-values that you say you can't get and then testing to see if they larger or smaller than your p-cutoff.
What you need to do is extract the p-values, and then assign names to them. If you can extract the p-values to test if they are larger or smaller than some value, how is it possible that you can't return the p-values themselves?
Hi James, Sorry if I am not able to articulate the problem I am having. I was able to extract the p-values to see if they are larger or smaller than the cutoff. after this I was able make Godata object and run Fishers test to get the table of GO terms for my set of genes. However I don't know what went wrong, my results look like this:
My code:
My results:
If you see here all that I am getting in p-value column is the number 1 not the actual p-values. So I thought may be I did something wrong when I assigned the p-values as 1 and 0.