Question

geneList error in clusterprofiler

0

Entering edit mode

Ana • 0

@432852fd

Last seen 12 months ago

Spain

Hi all,

I am trying to run the gseGO function in clusterpofiler to look at GSEA of my list of differentially expressed genes. The structure of the data is the following:

X    baseMean log2FoldChange     lfcSE      stat      pvalue        padj
1      MAGEA4   85.001287      10.406827 2.3304902  4.465510 7.99000e-06 0.010425219
2       CSAG3   27.223289       8.775162 2.2716481  3.862905 1.12046e-04 0.036408482
3        BEX1 1511.094123       6.927048 1.3661457  5.070505 3.97000e-07 0.002191714
4        NFE4   22.793449       5.939335 1.4924478  3.979593 6.90000e-05 0.028889367
5      NKX2-2   12.320182       5.747034 1.4758915  3.893941 9.86000e-05 0.034103595
6   LINC01876   83.826936       5.322948 1.1084075  4.802339 1.57000e-06 0.005414319
7        ZIC5   25.734088       5.320435 1.1046813  4.816262 1.46000e-06 0.005414319

I perform the following transformations to extract the gene list (I have tried this with the ensembl IDs as well, but the error is the same)

gene_list <- df_GSEA$X  # Assuming 'X' contains Ensembl gene IDs

gene_list = sort(gene_list, decreasing = TRUE)
gene_list<-na.omit(gene_list)


gse <- gseGO(geneList=gene_list, 
             ont ="ALL", 
             keyType = "SYMBOL", 
             nPerm = 10000, 
             minGSSize = 3, 
             maxGSSize = 800, 
             pvalueCutoff = 0.05, 
             verbose = TRUE, 
             OrgDb = org.Hs.eg.db, 
             pAdjustMethod = "none")

But I keep getting this kind of error:

preparing geneSet collections...
--> Expected input gene ID: VCX,PPARD,EDDM3A,MAMLD1,RAD50,SUPT6H
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

When I check the gene list, it contains the correct format of gene names, so I am not sure why it's not recognizing them.

> structure(gene_list)
 [1] "ZPLD1"      "ZIC5"       "UGT2B11"    "TUBA4B"     "TTLL11"     "TRHDE"      "TMEM164"    "TDRD1"      "SYT5"      
[10] "SYT1"       "STAC2"      "SOWAHA"     "SLCO4C1"    "SLC1A6"     "SCGN"       "SCGB1D2"    "RPL4P6"     "RNU4-2"    
[19] "RNF183"     "RHOXF1P3"   "RBM20"      "RAB38"      "PYDC1"      "PXDNL"      "PRKCA"      "PCDH10"     "NRXN1"     
[28] "NRCAM"      "NKX2-2"     "NFE4"       "NELL2"      "MT3"        "MSLN"       "MIR3150BHG" "MGAT5B"     "MEGF10"    
[37] "MALRD1"     "MAGEA4"     "LRP1B"      "LIX1"       "LINC01876"  "LINC01833"  "LINC01694"  "LINC01287"  "LINC00654" 
[46] "KRT222"     "KCP"        "KCNH6"      "KCNC3"      "KCNC2"      "IGHG4"      "HMGCS2"     "HEPACAM2"   "GPC2"      
[55] "GATA2-AS1"  "GALNT17"    "GABRQ"      "GABBR2"     "FLG"        "FAXDC2"     "FAM135B"    "EPO"        "ELFN2"     
[64] "DSG1"       "DPYSL5"     "DLX6"       "DIRC3"      "CYP27C1"    "CXCL17"     "CWH43"      "CSAG3"      "CNTNAP2"   
[73] "CNR1"       "CHRNA4"     "CDRT15P1"   "CDK5R2"     "CAPN6"      "CABP7"      "C1orf220"   "C11orf70"   "BEX1"

Does anyone have a suggestion on how to fix this?

Thanks a lot, Ana

clusterProfiler GeneSetEnrichment • 1.8k views

ADD COMMENT • link 12 months ago Ana • 0

score 1 · Answer 1 · 2024-04-16

1

Entering edit mode

Basti ▴ 780

@7d45153c

Last seen 12 hours ago

France

Have a look at the documentation : https://www.rdocumentation.org/packages/clusterProfiler/versions/3.0.4/topics/gseGO

geneList argument should be an order ranked geneList. Authors detail here how to format a ranked geneList :

"GSEA analysis requires a ranked gene list, which contains three features:

numeric vector: fold change or other type of numerical variable

named vector: every number has a name, the corresponding gene ID

sorted vector: number should be sorted in decreasing order"