Compute fgsea with my own database with R version 4.0.2
1
0
Entering edit mode
@a287dbc4
Last seen 3.6 years ago
France

I try to run an fgsea analyses. So I create my own database looks like this :

df_db <- read.csv('pathway_ecnumber.csv',sep=",")

df_db
path:map00010  3.2.1.86
path:map00500  3.2.1.86
path:map00010   4.1.1.1
path:map01100   4.1.1.1
path:map01110   4.1.1.1
path:map01130   4.1.1.1
path:map00010  4.1.1.32
path:map00020  4.1.1.32
path:map00620  4.1.1.32
path:map01100  4.1.1.32
path:map01110  4.1.1.32

df_db$enzyme<-gsub("ec:","",df_db$enzyme)
db_final<-df_db %>% dlply( "pathway", `[[`, "enzyme" ) %>% c
database_pathway <- db_final[!duplicated(names(db_final))]

database_pathway

$`path:map05410`
[1] "2.7.11.11" "3.4.15.1"  "2.7.11.1" 

$`path:map05414`
[1] "4.6.1.1"   "2.7.11.11" "2.7.11.1" 

$`path:map05416`
[1] "3.4.22.56" "3.4.22.61" "3.4.22.62" "2.7.10.2" 

And I create my rank like this :

   df_select <- df_data %>% dplyr::select(ECNUMBER, log2FoldChange)
    df_na <- df_select %>% drop_na()
    df_split <- df_na %>% mutate(ECNUMBER = strsplit(as.character(ECNUMBER), ",")) %>% unnest(ECNUMBER)
    df_split <- as.data.frame(df_split)
    df_unique <- unique(df_split)  
    df_na <- na.omit(df_unique)
    df1 <- filter(df_na, log2FoldChange != 0)
    geneList <- df1[,2]
    names(geneList) <- as.character(df1[,1])
    geneList2 = sort(geneList, decreasing = T)

geneList2
3.4.21.102   2.7.1.221    2.7.7.13     1.1.1.3    1.1.1.42   1.14.19.9 
      3.217       3.217       3.217       3.217       3.217       3.217

At the end I got two warnings, and I don't know how to deal with it :

Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are ties in the preranked stats (81.54% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are duplicate gene names, fgsea may produce unexpected results.

The second warning I try to resolve with this line but it is not working :

df_database <- database_pathway[!duplicated(names(database_pathway))]
R fgsea • 2.3k views
ADD COMMENT
1
Entering edit mode
alserg ▴ 280
@assaron
Last seen 4 months ago
St Louis, MO

I believe you should think over your design, as these warnings illustrate the problems in it.

It is very suspicious that you have multiple EC entries having exact same log2FC. This is either an error, or a flaw in the design, as the same gene can have multiple enzyme functions and single gene logFC goes to multiple EC numbers. However, GSEA assumes independence of the gene ranks, as it tests whether gene set looks randomly selected or not.

For the second warning, similarly, I expect you have a single enzyme can be represented by multiple genes, so you have multiple entries with the same EC number, which triggers the second warning.

ADD COMMENT
0
Entering edit mode

Thank you so much for your clear explanation !!

I used an orthologs database (eggNOG) so this explain my results ; one logFC corresponds to multiple ECnumber ;

So with your explanation; I will use only the first ECnumber of the list from the annotation's results to get only one ECnumber corresponds to an unique logFC.

For the construction of the database, is-it possible to get multiple ECnumber corresponding to different pathway ?.

ADD REPLY
1
Entering edit mode

I'm not sure what would be the best course of action in your case, if there is one at all, but I'd suggest to keep the ranking on the gene-level, and construct pathways consisting of genes. Genes can be associated with multilple pathways, there is no problem in that.

ADD REPLY

Login before adding your answer.

Traffic: 983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6