geneList error in clusterprofiler
1
0
Entering edit mode
Ana • 0
@432852fd
Last seen 8 months ago
Spain

Hi all,

I am trying to run the gseGO function in clusterpofiler to look at GSEA of my list of differentially expressed genes. The structure of the data is the following:

X    baseMean log2FoldChange     lfcSE      stat      pvalue        padj
1      MAGEA4   85.001287      10.406827 2.3304902  4.465510 7.99000e-06 0.010425219
2       CSAG3   27.223289       8.775162 2.2716481  3.862905 1.12046e-04 0.036408482
3        BEX1 1511.094123       6.927048 1.3661457  5.070505 3.97000e-07 0.002191714
4        NFE4   22.793449       5.939335 1.4924478  3.979593 6.90000e-05 0.028889367
5      NKX2-2   12.320182       5.747034 1.4758915  3.893941 9.86000e-05 0.034103595
6   LINC01876   83.826936       5.322948 1.1084075  4.802339 1.57000e-06 0.005414319
7        ZIC5   25.734088       5.320435 1.1046813  4.816262 1.46000e-06 0.005414319

I perform the following transformations to extract the gene list (I have tried this with the ensembl IDs as well, but the error is the same)

gene_list <- df_GSEA$X  # Assuming 'X' contains Ensembl gene IDs

gene_list = sort(gene_list, decreasing = TRUE)
gene_list<-na.omit(gene_list)


gse <- gseGO(geneList=gene_list, 
             ont ="ALL", 
             keyType = "SYMBOL", 
             nPerm = 10000, 
             minGSSize = 3, 
             maxGSSize = 800, 
             pvalueCutoff = 0.05, 
             verbose = TRUE, 
             OrgDb = org.Hs.eg.db, 
             pAdjustMethod = "none")

But I keep getting this kind of error:

preparing geneSet collections...
--> Expected input gene ID: VCX,PPARD,EDDM3A,MAMLD1,RAD50,SUPT6H
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

When I check the gene list, it contains the correct format of gene names, so I am not sure why it's not recognizing them.

> structure(gene_list)
 [1] "ZPLD1"      "ZIC5"       "UGT2B11"    "TUBA4B"     "TTLL11"     "TRHDE"      "TMEM164"    "TDRD1"      "SYT5"      
[10] "SYT1"       "STAC2"      "SOWAHA"     "SLCO4C1"    "SLC1A6"     "SCGN"       "SCGB1D2"    "RPL4P6"     "RNU4-2"    
[19] "RNF183"     "RHOXF1P3"   "RBM20"      "RAB38"      "PYDC1"      "PXDNL"      "PRKCA"      "PCDH10"     "NRXN1"     
[28] "NRCAM"      "NKX2-2"     "NFE4"       "NELL2"      "MT3"        "MSLN"       "MIR3150BHG" "MGAT5B"     "MEGF10"    
[37] "MALRD1"     "MAGEA4"     "LRP1B"      "LIX1"       "LINC01876"  "LINC01833"  "LINC01694"  "LINC01287"  "LINC00654" 
[46] "KRT222"     "KCP"        "KCNH6"      "KCNC3"      "KCNC2"      "IGHG4"      "HMGCS2"     "HEPACAM2"   "GPC2"      
[55] "GATA2-AS1"  "GALNT17"    "GABRQ"      "GABBR2"     "FLG"        "FAXDC2"     "FAM135B"    "EPO"        "ELFN2"     
[64] "DSG1"       "DPYSL5"     "DLX6"       "DIRC3"      "CYP27C1"    "CXCL17"     "CWH43"      "CSAG3"      "CNTNAP2"   
[73] "CNR1"       "CHRNA4"     "CDRT15P1"   "CDK5R2"     "CAPN6"      "CABP7"      "C1orf220"   "C11orf70"   "BEX1"

Does anyone have a suggestion on how to fix this?

Thanks a lot, Ana

clusterProfiler GeneSetEnrichment • 1.2k views
ADD COMMENT
1
Entering edit mode
Basti ▴ 780
@7d45153c
Last seen 5 hours ago
France

Have a look at the documentation : https://www.rdocumentation.org/packages/clusterProfiler/versions/3.0.4/topics/gseGO

geneList argument should be an order ranked geneList. Authors detail here how to format a ranked geneList :

"GSEA analysis requires a ranked gene list, which contains three features:

numeric vector: fold change or other type of numerical variable

named vector: every number has a name, the corresponding gene ID

sorted vector: number should be sorted in decreasing order"

ADD COMMENT
0
Entering edit mode

Ahh thank you, that fixed it!

ADD REPLY

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6