question on GO analysis using clusterProfiler
2
0
Entering edit mode
@giuseppe0525-14327
Last seen 6.1 years ago

hello everyone,

 

I came across a problem when I did GO analysis on differentially expressed genes derived from microarray using clusterProfiler . I gave a list of DEGs but failed to map any gene with the enrichGO() function.

 

I used the following script to do the analysis:

 

data(geneList)

gene <- names(geneList)

gene

head(gene)

str(gene)

 

 

ego <- enrichGO(gene          = gene,

                universe      = names(geneList),

                OrgDb         = org.Mm.eg.db,

                ont           = "BP",

                pAdjustMethod = "BH",

                pvalueCutoff  = 0.05,

                qvalueCutoff  = 0.1,

                minGSSize = 3,

                maxGSSize = 500

                )

head(ego)

head(summary(ego))

Unfortunately, an error message was returned saying

--> No gene can be mapped....

--> Expected input gene ID: 442829,71950,100041897,71711,11535,19264

--> return NULL...

> head(ego)

 

I obtained the data as input of enrichGO() from differentially expressed genes of microarray data using limma.

 

However, I got the GO enrichment result with “PANTHER” an online GO analysis tool.

 

I even loaded the entrez gene id from a txt file (as shown in attachment) but got the same error. The code to load the data was “genes <- read.csv("genes.txt", header = FALSE)“.

Could anyone help solve this problem?

Thanks in advance!

 

clusterprofiler geneontology • 8.0k views
ADD COMMENT
0
Entering edit mode

Hi,

It looks like there might be some issues with your input, but it is not possible to offer a solution without knowing the content of your geneList object.

In addition with the code you show the same content [names(geneList)] is given to argument 'gene' and argument 'universe' in the enrichGO function call. I believe you instead should have all genes on the array as 'universe' and the differentially expressed genes as 'gene'.

ADD REPLY
0
Entering edit mode

parts of the gene and universe as input are listed as followed: 

> gene
##
   [1] "4312"   "8318"   "10874"  "55143"  "55388"  "991"    "6280"   "2305"   "9493"   "1062"   "3868"   "4605"   "9833"   "9133"  
  [15] "6279"   "10403"  "8685"   "597"    "7153"   "23397"  "6278"   "79733"  "259266" "1381"   "3627"   "27074"  "6241"   "55165" 
  [29] "9787"   "7368"   "11065"  "55355"  "9582"   "220134" "55872"  "51203"  "3669"   "83461"  "22974"  "10460"  "10563"  "4751"  

 

> expst.id <- getEG(as.character(expst$NAME), "mouse430a2")
> head(expst.entzid)
##
      V1
1  54161
2  11972
3  57437
4 100678
5  60409
6  13481

 

Then I run the following script but got the same error:

> ego <- enrichGO(gene          = gene,
+                 keyType = "ENTREZID",
+                 universe      = expst.entzid,
+                 OrgDb         = org.Mm.eg.db,
+                 ont           = "BP",
+                 pAdjustMethod = "BH",
+                 pvalueCutoff  = 0.05,
+                 qvalueCutoff  = 0.1,
+                 minGSSize = 5,
+                 maxGSSize = 500
+                 )
##
--> No gene can be mapped....
--> Expected input gene ID: 16590,20662,69286,229357,21808,22415
--> return NULL...

 

Is it due to the NA given to the universe? Thanks!

ADD REPLY
0
Entering edit mode

Hello, 

I would bet that your problem is related to the coding of your gene Id's. Are they coded in Entrez ID? If not, try to map your IDs to Entrez ID and repeat your analysis. You could use bitr() to do that or externally through DAVID. 

ADD REPLY
0
Entering edit mode

 

I have checked the coding of gene IDs and they were in Entrez ID.

I used bitr() function to transform gene IDs and it worked well.

"

gene.df <- bitr(gene, fromType = "ENTREZID",
                toType = c("ENSEMBL", "SYMBOL"),
                OrgDb = org.Mm.eg.db)
head(gene.df)

"

## 

> head(gene.df)
  ENTREZID            ENSEMBL SYMBOL
1    16878 ENSMUSG00000034394    Lif
2    20310 ENSMUSG00000058427  Cxcl2
3    67951 ENSMUSG00000001473  Tubb6
4    14579 ENSMUSG00000028214    Gem
5    17392 ENSMUSG00000043613   Mmp3
7   104027 ENSMUSG00000043079  Synpo
ADD REPLY
0
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 23 days ago
China/Guangzhou/Southern Medical Univer…

data(geneList)

ego <- enrichGO(gene          = gene,

                universe      = names(geneList),

                OrgDb         = org.Mm.eg.db,

...

The geneList was obtained via data(geneList), then you are using a vector of human genes as background while testing for mouse gene (OrgDb = org.Mm.eg.db).

If this is the case, of course all genes can't be mapped.

ADD COMMENT
0
Entering edit mode
thokall ▴ 160
@thokall-14310
Last seen 6 weeks ago
Swedish Museum of Natural History

Hi,

Please try and clarify exactly what you have done. Your initial gene list does indeed as stated by  Guangchuang Yu contain human geneIDs so they will not map to mouse. The code you supply work with mouse gene ids. The example below is your code and part of your example data, but adding a mouse entrezid (54611) and changing the minGSSize to 1

> dput(gene)
## c("54611", "4312", "8318", "10874")

> dput(uni)
## c("54611", "54161", "11972", "8312", "4312", "8318", "10874")

> ego <- enrichGO(gene          = gene,
+                  keytype = "ENTREZID",
+                  universe      = uni,
+                  OrgDb         = org.Mm.eg.db,
+                  ont           = "BP",
+                  pAdjustMethod = "BH",
+                  pvalueCutoff  = 0.05,
+                  qvalueCutoff  = 0.1,
+                  minGSSize = 1,
+                  maxGSSize = 500
+                  )

> ego
#
# over-representation test
#
#...@organism      Mus musculus 
#...@ontology      BP 
#...@keytype      ENTREZID 
#...@gene      chr [1:4] "54611" "4312" "8318" "10874"
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...0 enriched terms found
'data.frame':    0 obs. of  9 variables:
 $ ID         : chr 
 $ Description: chr 
 $ GeneRatio  : chr 
 $ BgRatio    : chr 
 $ pvalue     : num 
 $ p.adjust   : num 
 $ qvalue     : num 
 $ geneID     : chr 
 $ Count      : int 
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287 

ADD COMMENT

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6