FELLA package - How to obtain the KEGG IDs that mapped to a pathway
1
0
Entering edit mode
@ceciliachiriac-20265
Last seen 5.8 years ago

Is there a way to find out which KEGG IDs mapped to a pathway? Specifically, in the output of generateResultsTable(method="hypergeom"...) function there are 2 columns called "CompoundHits" and "CompoundsInPathway". Can I find out to which KEGG IDs the numbers in these columns correspond?

fella • 2.1k views
ADD COMMENT
0
Entering edit mode

Edit: provided answer instead of comment

ADD REPLY
0
Entering edit mode

This a wrapper function of generateResultsTable

Compound IDs and names matched in pathway are included in the table. Thanks to sergi.picart!

library(tibble)
library(dplyr)
library(magrittr)
library(FELLA)
library(purrr)

## Get compound names from IDs
id_to_name <- function(list.hit) {
  ids_names <- getName(data, list.hit)
  ids_names <- map2(ids_names, names(ids_names), ~ paste(.y, paste(.x, collapse = ","), sep = ": ")) %>% unlist(.)
  return (list(Compound = paste(ids_names, collapse = " | ")))
}

## Add compound names and IDs as a new column in the table
generateResultsTableName <- function(data = data,
                                     object = object,
                                     method="hypergeom",
                                     threshold = 0.1) {

  tab.res <- generateResultsTable(
    method = method,
    threshold = threshold, 
    object = object, 
    data = data)

  if (is.null(tab.res)) return (NA)

  mat.hypergeom <- FELLA:::getMatrix(data, "hypergeom")
  list.cpd <- getInput(object)

  # all the metabolites (path) and those within the list (hits)
  list.path <- apply(mat.hypergeom, 2, function(x) names(which(x)))

  target.path <- purrr::map(tab.res$KEGG.id, ~ list.path[[.x]])
  names(target.path) <- tab.res$KEGG.id

  list.hits <- lapply(target.path, function(x) intersect(x, list.cpd))

  ids_names <- map(list.hits, ~ id_to_name(.x))

  df <- do.call(rbind, ids_names) %>% data.frame() %>% 
    rownames_to_column("KEGG.id") %>% left_join(tab.res, .)
  return (df)
}

Example

## First generate a toy enrichment

data(FELLA.sample)
data(input.sample)
## Enrich input
object <- enrich(
  compounds = input.sample, 
  data = FELLA.sample)

## GenerateResultsTable with compound name
df <- generateResultsTableName(data = FELLA.sample,
                                object = object,
                                method="hypergeom",
                                threshold = 1)
df
ADD REPLY
0
Entering edit mode
sergi.picart ▴ 20
@sergipicart-11865
Last seen 3.1 years ago
Germany

Dear Cecilia,

There is no "nice" built-in way to solve your query, but it is indeed possible to get around it. I have attached a script - hope it helps.

# example on how to extract individual metabolite hits in hypergeom
library(igraph)
library(FELLA)
library(plyr)
library(dplyr)

data("FELLA.sample")
data("input.sample")

myAnalysis <- enrich(
  compounds = input.sample, 
  method = "hypergeom", 
  data = FELLA.sample)

df.hypergeom <- generateResultsTable(
  method = "hypergeom", 
  object = myAnalysis, 
  data = FELLA.sample, 
  threshold = 1) 

# there is currently no nice function for this 
# you can obtain the internal metabolite-pathway matrix
# remember these are not the original KEGG pathways but the 
# pathways that can be reached from the metabolites in the 
# hierarchical, upwards-directed network
mat.hypergeom <- FELLA:::getMatrix(FELLA.sample, "hypergeom")
list.cpd <- intersect(FELLA::getInput(myAnalysis), rownames(mat.hypergeom))

# all the metabolites (path) and those within the list (hits)
list.path <- apply(mat.hypergeom, 2, function(x) names(which(x)))
list.hits <- lapply(list.path, function(x) intersect(x, list.cpd))

lapply(list.hits, head)

# check that the numbers make sense - compare to those in tab.hypergeom
df.hits <- plyr::ldply(names(list.path), function(path) {
  data.frameKEGG.id = path, 
             CompoundHits = length(list.hits[[path]]), 
             CompoundsInPathway = length(list.path[[path]]), 
             stringsAsFactors = FALSE)
})

df.hits

# all the rows should be repeated
# otherwise rise exception
stopifnot(nrow(dplyr::anti_join(df.hypergeom, df.hits)) == 0)
ADD COMMENT
1
Entering edit mode

Thank you for sharing this code. Is it possible to do something similar with the diffusion matrix as opposed to the hypergeometric matrix? The structure of the diffusion matrix appears to be different to the hypergeometric matrix and I cannot get the above code to work. I did convert the diffusion matrix to class 'lgCMatrix' (same class as the hypergeometric matrix) but I cannot work out how to map compounds to individual pathways. I appreciate that the generateResultsTable function lists which compounds results in enrichment of pathways but it would be useful to know which compounds actually map to each individual KEGG pathway and/or module. Thank you for your help.

ADD REPLY
0
Entering edit mode

Dear liampvmartin1992, please excuse the late reply. I did not notice the comment.

If by "doing someting similar" you mean...

  • Finding out what pathways each compound belongs to, originally (right before building the network): you should store the result of KEGGREST::keggLink("cpd", "pathway") when building the network. You can do it a posteriori, but there might be small changes with each KEGG release.

  • Finding out what pathways can be directly reached from each compound, right after building the network (e.g. compound C does not belong to pathway P originally, but is related to enzyme E which belongs to P, so P can be reached from C via C-E-P): the answer is the one to the original question above.

  • Finding out the contribution of each compound to each pathway: then you can grab the diffusion matrix with the kernel values and work on those directly. See code below.

# example on how to extract individual metabolite contributions in diffusion
# to know which metabolites were in which pathways see 03_hypergeom_getpathways
# https://support.bioconductor.org/p/119264/
library(igraph)
library(FELLA)
library(plyr)
library(dplyr)

data("FELLA.sample")
data("input.sample")

# rebuild diffusion matrix (since matrix is not 
# bundled in the default sample object)
g <- FELLA::getGraph(FELLA.sample)
db.dir <- paste0(tempdir(), "/fellasampledb")
FELLA::buildDataFromGraph(
  g, databaseDir = db.dir, internalDir = FALSE, niter = 10)

# reload db with diffusion matrix
FELLA.samplemat <- FELLA::loadKEGGdata(
  db.dir, internalDir = FALSE, loadMatrix = "diffusion")

# toy analysis
myAnalysis <- enrich(
  compounds = input.sample, 
  method = "diffusion", 
  data = FELLA.samplemat)

df.diffusion <- generateResultsTable(
  method = "diffusion", 
  object = myAnalysis, 
  data = FELLA.samplemat) 

# get diffusion matrix with kernel values
mat.diffusion <- FELLA:::getMatrix(FELLA.samplemat, "diffusion")
# input cpds
list.cpdin <- intersect(FELLA::getInput(myAnalysis), rownames(mat.diffusion))
# all cpds
list.cpdall <- FELLA::getCom(FELLA.samplemat, "compound")

# pathway hits
list.pathhits <- subset(df.diffusion, Entry.type == "pathway")$KEGG.id
# all pathways
list.pathall <- FELLA::getCom(FELLA.samplemat, "pathway")

# columns: input metabolites
# rows: pathways
# values: contribution of compound to pathway score 
mat.contr <- mat.diffusion[list.pathall, list.cpdin]
# The sum of each compound contributions to all pathway is 1
colSums(mat.contr)

# hsa00640 always gets more contribution than hsa00010 for the input compounds
rownames(mat.contr)[apply(mat.contr, 2, which.max)] %>% table

# in general, hsa00640 gets larger contributions than 
# hsa00010 from the compounds in the graph
# (24 compounds with larger contrib from hsa00010, 
# versus 301 compounds from hsa00640)
list.pathall[apply(mat.diffusion[list.pathall, list.cpdall], 2, which.max)] %>% table
ADD REPLY

Login before adding your answer.

Traffic: 892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6