Function simplify fails when argument ont is set to "ALL" in enrichGO function in package clusterProfiler
2
1
Entering edit mode
@huang-lei-bsd-cri-6332
Last seen 7.0 years ago

Dear Bioconductor users and package maintainer Guangchuang,

I was running a GO enrichment analysis on a set of DEGs using enrichGO function and simplify function from  the R/Bioconductor package clusterProfiler and got an error message when the argument "ALL" was used in enrichGO. The simplify function works fine if the individual GO subcategory ("BP", "MF", or "CC") was used . I tried to reproduce the error with a builtin data set (gsSample) and got the same error (please see below). I was wondering if anyone has the same experience or I did anything wrong. Thanks for the help!

Best,

Lei

 

> library(clusterProfiler)
Loading required package: DOSE

DOSE v3.4.0  For help: https://guangchuangyu.github.io/DOSE

If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609

clusterProfiler v3.6.0  For help: https://guangchuangyu.github.io/clusterProfiler

If you use clusterProfiler in published research, please cite:
Guangchuang Yu., Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
> library(org.Hs.eg.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid


>
> data(gcSample)
>
> # use only BP
> x <- enrichGO(gcSample[[2]], ont = 'BP', OrgDb = 'org.Hs.eg.db')
> y <- simplify(x, measure = 'Wang', semData = NULL)
>
> # use ALL
> x <- enrichGO(gcSample[[2]], ont = 'ALL', OrgDb = 'org.Hs.eg.db')
> y <- simplify(x, measure = 'Wang', semData = NULL)
Error in .local(x, ...) :
  simplify only applied to output from enrichGO...

>
> # session info
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] org.Hs.eg.db_3.5.0    AnnotationDbi_1.40.0  IRanges_2.12.0
[4] S4Vectors_0.16.0      Biobase_2.38.0        BiocGenerics_0.24.0
[7] clusterProfiler_3.6.0 DOSE_3.4.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14        plyr_1.8.4          compiler_3.4.3
 [4] tools_3.4.3         digest_0.6.13       bit_1.1-12
 [7] lubridate_1.7.1     RSQLite_2.0         memoise_1.1.0
[10] tibble_1.3.4        gtable_0.2.0        pkgconfig_2.0.1
[13] rlang_0.1.4         igraph_1.1.2        fastmatch_1.1-0
[16] DBI_0.7             rvcheck_0.0.9       gridExtra_2.3
[19] fgsea_1.4.0         stringr_1.2.0       tidyselect_0.2.3
[22] autoinst_0.0.0.9000 bit64_0.9-7         grid_3.4.3
[25] glue_1.2.0          qvalue_2.10.0       data.table_1.10.4-3
[28] BiocParallel_1.12.0 GOSemSim_2.4.0      rematch2_2.0.1
[31] purrr_0.2.4         tidyr_0.7.2         reshape2_1.4.3
[34] GO.db_3.5.0         DO.db_2.9           ggplot2_2.2.1
[37] blob_1.1.0          magrittr_1.5        splines_3.4.3
[40] scales_0.5.0        colorspace_1.3-2    stringi_1.1.6
[43] lazyeval_0.2.1      munsell_0.4.3
>

clusterProfiler GO GO enrichment genesetenrichment • 4.3k views
ADD COMMENT
3
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 27 days ago
China/Guangzhou/Southern Medical Univer…

It did check whether res@ont %in% c("MF", "BP", "CC") and not support 'ALL' currently.

Maybe will support it in future release.

ADD COMMENT
0
Entering edit mode

FWIW this problem similarly "stumped" me for a little while.... I do hope it is resolved in the future... tx 

ADD REPLY
0
Entering edit mode

sorry, I post my question here, maybe not related to the topic. Now I'm doing enrichment using clusterprofiler and WebGestalR. Here, x <- unique(unlist(as.list(org.Bt.egGO2ALLEGS))), x is 5586. And clusterprofiler is based on GO.db. The following is using WebGestalR: enrichD_BP <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Biological_Process_noRedundant") geneSet_BP <- enrichD_BP$geneSet length(unique(geneSet_BP$gene))#9011

enrichD_CC <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Cellular_Component_noRedundant") geneSet_CC <- enrichD_CC$geneSet length(unique(geneSet_CC$gene))#6224

enrichD_MF <- loadGeneSet(organism = "btaurus",enrichDatabase = "geneontology_Molecular_Function_noRedundant") geneSet_MF <- enrichD_MF$geneSet length(unique(geneSet_MF$gene))#7960

we can see WebGestalR has more gene set than clusterprofiler. So which one is the most up to date and same with online GO database? But how to get the gene set from online GO database. This question has puzzled me these day. Becasue I think they two are powerful and should have the same results

ADD REPLY
0
Entering edit mode
jandrade • 0
@79c8f005
Last seen 8 months ago
United States

I doubt this will come in handy to you all at this point, but I came up with a custom function based on clusterProfiler's simplify(). I had to go into the source code (https://rdrr.io/bioc/clusterProfiler/src/R/simplify.R) to see what it was doing, but I was able to add support for the 'ALL' option by changing this:

if (!x@ontology %in% c("BP", "MF", "CC")) stop("simplify only applied to output from gsegO and enrichGO...")

to this:

if (!x@ontology %in% c("BP", "MF", "CC","GOALL")) stop("simplify only applied to output from gsegO and enrichGO...")

and then adding a for loop to run simplify_interal over each ontology term present in the output of enrichGO and then just binding the resulting simplified data frames together. I threw it together pretty sloppily, so hopefully it works for anyone. You'll just need to define both the custom function called my_simplify as well as the clusterProfiler's function called simplify_internal which for some reason isn't imported with the other functions. simplify_internal also needs dplyr and GoSemSim. If you run this code, it should take care of all that for you and you can use "ALL" in your enrichGo runs. :)

library(tidyverse)
library(GoSemSim)

my_simplify <- function (x, ...) 
{
.local <- function (x, cutoff = 0.7, by = "p.adjust", 
    select_fun = min, measure = "Wang", semData = NULL) 
{
    if (!x@ontology %in% c("BP", "MF", "CC", "GOALL")) 
        stop("simplify only applied to output from gsegO and enrichGO...")
    if (x@ontology == "GOALL") {
      res_list <- list()
      for (i in unique(as.data.frame(x)$ONTOLOGY)) {
        res_list[[i]] <- simplify_internal(
          as.data.frame(x)[as.data.frame(x)$ONTOLOGY==i,],
          cutoff, 
          by,
          select_fun,
          measure,
          i,
          NULL
          )
        x@result <- bind_rows(res_list)
      }
      return(x)
    } else {
      x@result <- simplify_internal(
        as.data.frame(x),
        cutoff, 
        by,
        select_fun,
        measure,
        x@ontology,
        NULL)
      return(x)          
    }
  }
  .local(x, ...)
}

simplify_internal <- function(
  res,
  cutoff=0.7, 
  by="p.adjust",
  select_fun=min,
  measure="Rel",
  ontology,
  semData)
  {
    if (missing(semData) || is.null(semData)) {
        if (measure == "Wang") {
        semData <- godata(ont = ontology)
    } else {
        stop("godata should be provided for IC-based methods...")
    }
    } else {
    if (ontology != semData@ont) {
        msg <- paste(
          "semData is for",
          semData@ont,
          "ontology, while enrichment result is for",
          ontology
          )
        stop(msg)
    }
    }

    sim <- mgoSim(res$ID, res$ID,
              semData = semData,
              measure=measure,
              combine=NULL)

## to satisfy codetools for calling gather
go1 <- go2 <- similarity <- NULL

sim.df <- as.data.frame(sim)
sim.df$go1 <- row.names(sim.df)
sim.df <- gather(sim.df, go2, similarity, -go1)

sim.df <- sim.df[!is.na(sim.df$similarity),]

## feature 'by' is attached to 'go1'
sim.df <- merge(sim.df, res[, c("ID", by)], by.x="go1", by.y="ID")
sim.df$go2 <- as.character(sim.df$go2)

ID <- res$ID

GO_to_remove <- character()
for (i in seq_along(ID)) {
    ii <- which(sim.df$go2 == ID[i] & sim.df$similarity > cutoff)
    ## if length(ii) == 1, then go1 == go2
    if (length(ii) < 2)
        next

    sim_subset <- sim.df[ii,]

    jj <- which(sim_subset[, by] == select_fun(sim_subset[, by]))

    ## sim.df <- sim.df[-ii[-jj]]
    GO_to_remove <- c(GO_to_remove, sim_subset$go1[-jj]) %>% unique
        }

res[!res$ID %in% GO_to_remove, ]
}   
ADD COMMENT

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6