Hi I'm trying to test for over representation for a given gene set using the kegga function.
However I'm doing something wrong since I'm getting different results when I test this on the KEGG pathways download from Misgdb . For example here are my codes
EDIT
Sorry nevermind:
it works now . I realized that the genes in each pathway were not the equivalent. The one that limma downloads from japan is more comprehensive I think. When artificially added the the genes to match that of limma the p values came out the same.
Also I realized that the my original dataframe had factors I needed to remove.
library ( msigdbr )
# get KEGG
m_df = msigdbr(species = "Homo sapiens", category = "C2", subcategory = "CP:KEGG")
# make it look like the format used for kegga
m_df = data.frame(m_df, stringsAsFactors = F)
entrez.hallmark = m_df[ , c("gs_name","entrez_gene") ]
colnames ( entrez.hallmark ) = c("PathwayID", "GeneID")
kegg_db = entrez.hallmark[ ,c("GeneID", "PathwayID") ]
ok so now when run this against a list with or without specifying the db I'm getting different results.
k1 = kegga(list(Up=up, Down=down)
,species=species,
universe=bg, FDR=.05)
k2 = kegga(list(Up=up, Down=down)
, gene.pathway=kegg_db
,species=species,
universe=bg, FDR= .05 )
The outputs for the two are below. I can see that the pathway is not picked up on the custom db however I even using the rowname the two outputs don't match up.
k1
Pathway N Up Down P.Up P.Down p
path:hsa04976 Bile secretion 72 3 0 * ns 0.003960504
path:hsa04950 Maturity onset diabetes of the young 26 2 0 * ns 0.005814014
path:hsa04610 Complement and coagulation cascades 85 3 0 * ns 0.006294251
path:hsa04960 Aldosterone-regulated sodium reabsorption 37 2 0 * ns 0.011545824
path:hsa05205 Proteoglycans in cancer 204 4 0 * ns 0.012680463
path:hsa04977 Vitamin digestion and absorption 24 0 1 ns * 0.016832191
path:hsa04672 Intestinal immune network for IgA production 48 2 0 * ns 0.018952627
path:hsa00250 Alanine, aspartate and glutamate metabolism 36 0 1 ns * 0.025147597
path:hsa04514 Cell adhesion molecules (CAMs) 148 3 0 * ns 0.027673157
path:hsa04216 Ferroptosis 40 0 1 ns * 0.027904606
path:hsa05033 Nicotine addiction 40 0 1 ns * 0.027904606
path:hsa04934 Cushing syndrome 154 3 0 * ns 0.030623021
path:hsa05217 Basal cell carcinoma 63 2 0 * ns 0.031449531
path:hsa04530 Tight junction 169 3 0 * ns 0.038691464
path:hsa04978 Mineral absorption 58 0 1 ns * 0.040220273
path:hsa00430 Taurine and hypotaurine metabolism 11 1 0 * ns 0.047331938
k2
Pathway N Up Down P.Up P.Down p
KEGG_COMPLEMENT_AND_COAGULATION_CASCADES <NA> 69 3 0 * ns 0.003512613
KEGG_MATURITY_ONSET_DIABETES_OF_THE_YOUNG <NA> 25 2 0 * ns 0.005382161
KEGG_ALDOSTERONE_REGULATED_SODIUM_REABSORPTION <NA> 42 2 0 * ns 0.014715329
KEGG_TYROSINE_METABOLISM <NA> 42 2 0 * ns 0.014715329
KEGG_INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION <NA> 47 2 0 * ns 0.018214589
KEGG_TIGHT_JUNCTION <NA> 132 3 0 * ns 0.020590128
KEGG_BASAL_CELL_CARCINOMA <NA> 55 2 0 * ns 0.024460505
KEGG_HEDGEHOG_SIGNALING_PATHWAY <NA> 56 2 0 * ns 0.025294715
KEGG_TASTE_TRANSDUCTION <NA> 52 1 1 ns * 0.036131518
KEGG_LIMONENE_AND_PINENE_DEGRADATION <NA> 10 1 0 * ns 0.043122413
KEGG_TAURINE_AND_HYPOTAURINE_METABOLISM <NA> 10 1 0 * ns 0.043122413