GSEA using clusterProfiler
1
0
Entering edit mode
@giuseppe0525-14327
Last seen 6.1 years ago

Hello, everyone

I 'm working on GSEA using clusterProfiler​ and came across two problems.

1. How should I prepare geneList​

2. What does TERM2GENE mean​?

I've worked out the differentially expressed genes from mouse microarray data.

Then I tried the following script but failed to obtain the results.

"

library(clusterProfiler)

geneList <- results$gene.id
geneList <- levels(geneList)
geneList = sort(geneList, decreasing = TRUE)
head(geneList)


term <- rep("ischemia24h",length.out = 844)
head(term)

TERM2GENE <- cbind(term, geneList)
head(TERM2GENE)

GSEA(geneList,
     exponent = 1,
     nPerm = 1000,
     minGSSize = 2,
     maxGSSize = 5000,
     pvalueCutoff = 0.05,
     pAdjustMethod = "BH",
     TERM2GENE,
     TERM2NAME = NA,
     verbose = TRUE,
     seed = FALSE,
     by = "fgsea")

"

##

preparing geneSet collections...
--> Expected input gene ID: 94094,223650,17926,12831,29875,68178
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

 

Thank you!

 

 

GSEA clusterProfiler • 13k views
ADD COMMENT
1
Entering edit mode

Hi,

It seems that you have the same issue as in your previous question regarding clusterProfiler. Have you checked that your supplied gene list of the right type and from the right species? It would be helpful if you also showed the output of the function:

>head(geneList)

Regarding the TERM2GENE I found this section in the vignette to the package:

"TERM2GENE is a data.frame with first column of term ID and second column of corresponding mapped gene and TERM2NAME is a data.frame with first column of term ID and second column of corresponding term name. TERM2NAME is optional."

ADD REPLY
0
Entering edit mode
@giuseppe0525-14327
Last seen 6.1 years ago

It seems that the problem resulted from the format of geneList, but I don't know how to create a data collection in the same format.

I checked the example provided by the developer of the package and found that the geneList was a numeric vector whose element contained two numbers. One is the ID of gene, and the other seems to indicate the difference of expression among groups.

The description of the geneList is shown as followed:

> data(geneList, package="DOSE")
> class(geneList)
# [1] "numeric"
> head(geneList)
#    4312     8318    10874    55143    55388      991 
#4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 
> length(geneList)
[1] 12495

Would you tell me how to generate data in such format? I'd really appreciate your help.

 

ADD COMMENT
0
Entering edit mode

Please specify how you prepared the data you would like to use.

ADD REPLY
0
Entering edit mode

Thanks for your reply.

In brief, I obtained the differentially expressed genes from microarray data based on Limma.

The results are organized in a data frame with columns like gene ID, logFC, p value, adj. p value etc. In addition, I have the expression set from which I got the DEGs.

The following are parts of the results and expression set:

> head(results)
                logFC  AveExpr        t      P.Value    adj.P.Val        B           ID gene.symbols gene.id change
1421207_at   4.818821 4.522985 44.58554 2.695409e-11 6.069522e-07 13.89607   1421207_at          Lif   16878     UP
1449984_at   7.304190 6.107650 28.30740 1.205558e-09 8.434865e-06 11.91560   1449984_at        Cxcl2   20310     UP
1416431_at   6.592971 5.841120 27.52500 1.522888e-09 8.434865e-06 11.76300   1416431_at        Tubb6   67951     UP
1426063_a_at 3.948953 4.134480 27.31784 1.621853e-09 8.434865e-06 11.72128 1426063_a_at          Gem   14579     UP
1418945_at   7.182747 6.057138 26.72206 1.948932e-09 8.434865e-06 11.59811   1418945_at         Mmp3   17392     UP
1457644_s_at 7.437478 6.163691 26.26859 2.247499e-09 8.434865e-06 11.50108 1457644_s_at         <NA>    <NA>     UP

 

head(eset)
             GSM805714.CEL GSM805715.CEL GSM805716.CEL GSM805717.CEL GSM805741.CEL GSM805742.CEL GSM805743.CEL GSM805744.CEL
1415670_at        11.02038      11.16112      10.97783      11.08388      10.83234      10.94515      10.81384      10.92230
1415671_at        12.74677      12.72211      12.62761      12.64847      12.67121      12.51231      12.52843      12.66846
1415672_at        11.72221      11.68903      11.75651      11.78890      12.03472      11.91181      12.04743      11.95583
1415673_at        10.17084      10.15074      10.22884      10.11947      10.42166      10.68337      10.17406      10.55455
1415674_a_at      10.71065      10.70821      10.75942      10.60990      10.36245      10.51380      10.63063      10.61658
1415675_at        11.45855      11.48796      11.40529      11.41186      11.19422      11.13612      11.30387      11.20107

 

Although I could not use the gseGO() and GSEA() functions correctly, I managed to use the enrichGO() and enrichKEGG() functions, which require gene as input. So I speculate the reason why I failed to operate gseGO() and GSEA() functions is that I cannot create a geneList in the same format as the example of clusterprofiler.

Looking forward to your reply.

ADD REPLY
1
Entering edit mode

I always use the limma output (which is of class MArrayLM) directly by subsetting/extracting the contrast of interest. It should work very similar when having your data in a data frame.

 

#check comparisons/contrasts that were made (and are stored in the fit object)
colnames(fit2$contrasts)

#[1] "A7"    "M19"    "MvsA.delta"

# Extract contrast of interest.
# I like to use the moderated t-value, but you can also use logFC or log(pvalue).
# In case of latter, be sure to use 'signed-ranked' list, so up/down is clear.
# Gene symbols are also stored in limma object, and used as rownames. This is done
# because in the list of gene sets also symbols are used. 

A7.t <- fit2$t[,"A7"]
names(A7.t) <- fit2$genes$ID
A7.t<- sort(A7.t, decreasing = TRUE)


> head(A7.t)
       ARNTL        NFIL3      FAM124B LOC105369382         SGK1         NRG2
   14.261763     9.442942     8.845451     8.795412     8.523587     8.460763
>


class(fit2)
#[1] "MArrayLM"
#attr(,"package")
#[1] "limma"

 

Run GSEA:
# I use a subset of information from the KEGG databases (as *.gmt; 239 gene sets) that I obtained outside of clusterProfiler.

gsea.A <- GSEA(geneList=A7.t, nPerm=10000, minGSSize = 15, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = "BH", TERM2GENE=pathways, TERM2NAME=GS2name.hs)

#> dim(pathways)
[1] 19836     2
# content pathways
> head(pathways)
                                        ont   gene
1 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG  FARSB
2 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG  WARS2
3 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG  FARS2
4 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG   PSTK
5 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG  MTFMT
6 hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG TARSL2
>

# conversion of KEGG descriptions to 'nicer' descriptions
> dim(GS2name.hs)
[1] 239   2
> head(GS2name.hs)
     names.set.hs                                      descr.set.hs                      
[1,] "hsa00970.Aminoacyl.tRNA.biosynthesis.KEGG"       "Aminoacyl-tRNA biosynthesis"     
[2,] "hsa02010.ABC.transporters.KEGG"                  "ABC transporters"                
[3,] "hsa03008.Ribosome.biogenesis.in.eukaryotes.KEGG" "Ribosome biogenesis in eukaryotes"
[4,] "hsa03010.Ribosome.KEGG"                          "Ribosome"                        
[5,] "hsa03013.RNA.transport.KEGG"                     "RNA transport"                   
[6,] "hsa03015.mRNA.surveillance.pathway.KEGG"         "mRNA surveillance pathway"       
>
> gsea.A <- GSEA(geneList=A7.t, nPerm=10000, minGSSize = 15, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = "BH", TERM2GENE=pathways, TERM2NAME=GS2name.hs)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
ADD REPLY
0
Entering edit mode

Thank you very much for your help!

 

ADD REPLY

Login before adding your answer.

Traffic: 646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6