Question

Cluster profiler - KEGG analysis

0

Entering edit mode

sunnykevin97 • 0

@sunnykevin97-23726

Last seen 2.7 years ago

Norway

HI

I some how, figure out the manual to reproduce the same plots for my data. I'm working with non-model organism it doesn't have a annotation information or package for GO/KEGG analysis. I had ~4 experimental setup, after edgeR analysis based on the padjustvalue lessthan 0.05 I chosen significantly expressed differential genes include both Up/Down regulated genes.
I chosen, zebrafish annotation for my data, performed Gene Set enrichment analysis works very well. While KEGG enrichment step I'm getting "no term enriched under specific pvalueCutoff..." any sugesstions ? In this case, how do I get KEGG annotation?

>     library(org.Dr.eg.db)
>     kegg_organism = "dre"
>     kk2 <- gseKEGG(geneList = KGgenelist, organism = "dre", nPerm = 10000, minGSSize = 3,maxGSSize = 800,pAdjustMethod = "none",keyType =
> "ncbi-geneid")
>     preparing geneSet collections...
>     GSEA analysis...
>     no term enriched under specific pvalueCutoff...
>     Warning messages:
>     1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize,  :
>       We do not recommend using nPerm parameter incurrent and future releases
>     2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize,  :
>       You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm
> argument in the fgsea function call.

clusterprofiler • 10k views

ADD COMMENT • link 4.8 years ago sunnykevin97 • 0

Kevin Blighe · Answer 1 · 2020-07-08

2

Entering edit mode

Guido Hooiveld ★ 4.1k

@guido-hooiveld-2020

Last seen 25 days ago

Wageningen University, Wageningen, the …

It is unclear to me what you are asking... What about adding the argument pvalueCutoff = 1 to your call? [Default value = 0.05]. By doing so you don't filter on significance, and all tested gene sets should thus be returned.

ADD COMMENT • link 4.8 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

When I run with gene-list(pvalueCutoff 0.05), this how I'm getting output.

> kk2
#
# Gene Set Enrichment Analysis
#
#...@organism    dre 
#...@setType     KEGG 
#...@geneList    Named num [1:157] 0.508 0.439 0.407 0.364 0.359 ...
 - attr(*, "names")= chr [1:157] "100003547" "404621" "568653" "450072" ...
#...nPerm    10000 
#...pvalues adjusted by 'none' with cutoff <0.05 
#...0 enriched terms found
'data.frame':   0 obs. of  8 variables:
 $ ID             : chr 
 $ Description    : chr 
 $ setSize        : int 
 $ enrichmentScore: num 
 $ NES            : num 
 $ pvalue         : num 
 $ p.adjust       : num 
 $ qvalues        : num 
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

ADD REPLY • link updated 4.8 years ago by Kevin Blighe ★ 4.0k • written 4.8 years ago by sunnykevin97 • 0

1

Entering edit mode

And what happens when using pvalueCutoff = 1 (thus what I suggested above)?

In addition, likely the culprit ; your input list seems NOT to be suitable for a GSEA analysis! For that you need to use as input a metric (e.g. t-value of signed log(p-value) for ALL genes analyzed, not a subset! You used as input only 157 genes!

An over-representation analysis should rather be used when selecting a subsets of analyzed genes!

ADD REPLY • link 4.8 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

It works with pvalueCutoff = 1. Thanks. Could you explain me clearly, I'm new to R, I could'nt understand what you're saying, Please.

ADD REPLY • link 4.8 years ago sunnykevin97 • 0

2

Entering edit mode

If it works without filtering on significance (thus by setting pvalueCutoff = 1) this basically tells you that from a coding point of view you did things OK.

Next question is of course whether the results make sense. You are responsible for both things. Apparently you are struggling with the basics of the various methodologies. I would recommend you dive into the literature the fill that knowledge gap. Some suggestions: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002375 (the differences between ORA and FCS (=GSEA)) are highlighted). https://www.pathwaycommons.org/guide/primers/data_analysis/gsea/ (nice primer into GSEA)

Also: the people at the Broad Institute (who were the first to publish on GSEA) implicitly recommend very limiting filtering of a transcriptomics dataset before running GSEA, hence my comment.

ADD REPLY • link 4.8 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

For the First sample, I generated some good plots make sense. For the second sample, I tried the same steps for Gene set enrichment analysis(gseGO) works fine.
For gseKEGG analysis It shows "no term enriched under specific pvalueCutoff" (with pvalueCutoff 1) and I'm not getting any results. How do I plot this data ?

#KGgenelist1 is the gene_list consists of 30 genes.
**100034471     553631  100141358     561451     557842     324205     568683     556251     394248
0.5299130  0.2826889 -0.1629156 -0.1736552 -0.1901334 -0.2018316 -0.2053550 -0.2157412 -0.2718013
555341    553162  100093707    571208    406820    560561    564044    790916
-0.3002093 -0.3009267 -0.3416203 -0.3452166 -0.3576778 -0.3686069 -0.3747898 -0.4056645 
493615  100002938     324332     751761     323089     561411     192340     797322     323269     561754     360144     798876     494532 
-0.4105849 -0.4197271 -0.4220734 -0.4244880 -0.4512894 -0.4614845 -0.4832835 -0.4883691 -0.5112109 -0.5234817 -0.5391687 -0.5416503 -0.5645655**

Suggestions please.

> library(clusterProfiler)      
> library(org.Dr.eg.db)
> kegg_organism = "dre"
> kkg <- gseKEGG(geneList = KGgenelist1, organism = kegg_organism,minGSSize = 3,maxGSSize = 800,pAdjustMethod = "none",keyType = "ncbi-geneid",pvalueCutoff = 1)
preparing geneSet collections...
GSEA analysis...
no term enriched under specific pvalueCutoff...
> head(kkg)
[1] ID              Description     setSize         enrichmentScore NES             pvalue          p.adjust        qvalues        
<0 rows> (or 0-length row.names)

ADD REPLY • link updated 4.8 years ago by Kevin Blighe ★ 4.0k • written 4.8 years ago by sunnykevin97 • 0

1

Entering edit mode

Well, there is nothing to plot...!

Reason for this is that you only used 30 genes as input!! First of all, and as said before, you should not use a subset of genes for GSEA analysis!

Also note that of these 30 input genes, only 9 are annotated to a KEGG pathway!! And max 2 of these 9 genes are in the same pathway. Since you set minGSSize = 3, no gene set (pathway) was included in the analysis... and since there are no pathways to analyze, there won't be any result!

If you set minGSSize = 1 you will get results...

The one and only positive aspect of such an analysis is that its shows your code is working, but other than that.... ????

ADD REPLY • link 4.8 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks for the nice explanation, I'm doing analysis only for significantly selected expressed genes below the 0.05 cutoff value. To understand genes role in pathways.

ADD REPLY • link 4.8 years ago sunnykevin97 • 0

0

Entering edit mode

HI, I had around 100 Genes start with LOC* prefix, they don't have any orthologs. I searched in NCBI to find out the gene-id's information and further I try to convert them to ensemble using bitr function and further I perform groupGO & gseKEGG shows error. Do I have to neglect these genes ?

g1
117748635 117732005 117745460 117743866 117751129 117745510 117733589  117728330 117741824 117747932

ggo <- groupGO(gene = g1, OrgDb = org.Dr.eg.db, ont = "CC", level = 3, readable = TRUE)
Error in validObject(.Object) : 
  invalid class “groupGOResult” object: invalid object for slot "gene" in class "groupGOResult": got class "integer", should be or extend class "character"

Ugenelist
 117748635  117732005  117745460  117743866  117751129  117745510  117733589  117728330  117741824  117747932 
 0.6696894  0.5031845  0.3885949 -0.2227779 -0.2740705 -0.3453581 -0.4627259 -0.5009327 -0.5406563 -0.5890710

kk4 <- gseKEGG(geneList = Ugenelist, organism = 'dre',nPerm = 1000, minGSSize = 120,verbose = FALSE)
Reading KEGG annotation online:

Reading KEGG annotation online:

--> Expected input gene ID: 100002993,282672,568448,445505,402939,445175
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

ADD REPLY • link 4.8 years ago sunnykevin97 • 0

0

Entering edit mode

Kevin, please, you have already asked this question on Biostars: https://www.biostars.org/p/448859/

Always mention the other web-sites where you have asked questions.

ADD REPLY • link 4.8 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thanks for the nice explanation, I'm doing analysis only for significantly selected expressed genes below the 0.05 cutoff value. To understand genes role in pathways.

Nothing passes this threshold, so, nothing can be plot. You will have to go back a few steps in your analysis to determine why there may be nothing passing the 0.05 threshold. Thanks! Guido has already answered the main question.

ADD REPLY • link 4.8 years ago Kevin Blighe ★ 4.0k