Hi all!!
My name is Aritz Irizar and I am a postdoctoral fellow at the Icahn School of Medicine at Mount Sinai.
I'm trying to use EGSEA to generate pathway/gene-set enrichment results from a RNA seq dataset that contains two experimental groups (disease vs control) sampled at 4 timepoints. I perform 4 disease vs control contrasts, one for each timepoint and I feed the corresponding voom object and the contrast matrix to egsea in a for loop where, in each iteration, only one of the GSA methods is used so I can see which of them work and which give an error.
For some reason, 9 of the 12 base GSA methods give an error (only CAMERA, ROAST and FRY seem to work).
See the code and the errors below please:
#generate gene-level voom object
dge_gene <- DGEList(counts = gcounts, genes = gannots)
dge_gene <- dge_gene[rowSums(cpm(dge_gene) > 0.2) > 5, keep.lib.sizes = F]
dge_gene <- calcNormFactors(dge_gene)
#build model for differential expression analysis
diffexpr_model <- model.matrix(~ ctrl_disease + day + RIN + total_RNA_ug + ctrl_disease:day, data = phenotypes)
colnames(diffexpr_model) <- make.names(colnames(diffexpr_model))
#perform voom transformation
pdf("diffexpr_voom_plot.pdf")
gene_voom <- voom(dge_gene, design = diffexpr_model, plot = TRUE, span = 0.5)
dev.off()
#subset voom object to contain only genes with an Entrez Gene ID
entrez_voom <- gene_voom[which(!is.na(gene_voom$genes$EntrezID)),]
entrez_voom <- entrez_voom[-c(which(duplicated(entrez_voom$genes$EntrezID))),]
rownames(entrez_voom) <- entrez_voom$genes$EntrezID
rownames(entrez_voom$genes) <- entrez_voom$genes$EntrezID
#build gene-set collections to compute enrichment for
murine_gs_annots <- buildIdx(entrezIDs = entrez_voom$genes$EntrezID, species = "mouse", msigdb.gsets = c("c2", "c5", "c7"), gsdb.gsets = "none", kegg.exclude = c("Metabolism"), min.size = 10)
#generate a symbolsMap
symbols_map <- entrez_voom$genes[, c(2,8)]
colnames(symbols_map) <- c("FeatureID", "Symbols")
#establish the list of independent gene-set enrichment methods for EGSEA to use
egsea_methods <- egsea.base()
#build contrasts for EGSEA
gsea_contrasts <- makeContrasts(
day5 = ctrl_diseaseD,
day12 = ctrl_diseaseD + ctrl_diseaseD.day12,
day17 = ctrl_diseaseD + ctrl_diseaseD.day17,
day36 = ctrl_diseaseD + ctrl_diseaseD.day36,
levels = diffexpr_model
)
#run EGSEA
#first pass to see which base GSA methods work in the dataset
egsea_dir_1 <- "./single_gsa_results"
unlink(egsea_dir_1, recursive = TRUE, force = TRUE)
dir.create(egsea_dir_1)
gsa_results_list <- list()
for(i in 1:length(egsea_methods)){
tryCatch({
print(i)
gsa_results <- egsea(voom.results = entrez_voom, contrasts = gsea_contrasts, gs.annots = murine_gs_annots, symbolsMap = symbols_map, baseGSEAs = egsea_methods[i], report = FALSE, print.base = FALSE, verbose = TRUE, keep.limma = FALSE, keep.set.score = TRUE, egsea.dir = egsea_dir_1, num.threads = 7)
gsa_results_list[[i]] <- gsa_results
}, error = function(e){cat("ERROR:", conditionMessage(e), "\n")})
}
1] 1
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running CAMERA for day5"
[1] " Running CAMERA for day12"
[1] " Running CAMERA for day17"
[1] " Running CAMERA for day36"
Running CAMERA on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c5 gene sets"
[1] " Running CAMERA for day5"
[1] " Running CAMERA for day12"
[1] " Running CAMERA for day17"
[1] " Running CAMERA for day36"
Running CAMERA on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c7 gene sets"
[1] " Running CAMERA for day5"
[1] " Running CAMERA for day12"
[1] " Running CAMERA for day17"
[1] " Running CAMERA for day36"
Running CAMERA on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] " Running CAMERA for day5"
[1] " Running CAMERA for day12"
[1] " Running CAMERA for day17"
[1] " Running CAMERA for day36"
Running CAMERA on all contrasts ... COMPLETED
[1] "EGSEA analysis has completed"
[1] 2
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running ROAST for day5"
[1] " Running ROAST for day12"
[1] " Running ROAST for day17"
[1] " Running ROAST for day36"
Running ROAST on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c5 gene sets"
[1] " Running ROAST for day5"
[1] " Running ROAST for day12"
[1] " Running ROAST for day17"
[1] " Running ROAST for day36"
Running ROAST on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c7 gene sets"
[1] " Running ROAST for day5"
[1] " Running ROAST for day12"
[1] " Running ROAST for day17"
[1] " Running ROAST for day36"
Running ROAST on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] " Running ROAST for day5"
[1] " Running ROAST for day12"
[1] " Running ROAST for day17"
[1] " Running ROAST for day36"
Running ROAST on all contrasts ... COMPLETED
[1] "EGSEA analysis has completed"
[1] 3
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running SAFE for day5"
[1] "ERROR: SAFE encountered an \n\t\t\t\t\t\t\terror Error in runsafe(voom.results = voom.results, contrast = contrast, gs.annot = gs.annot, : Invalid contrasts selected.\n"
ERROR: ERROR: One of the GSE methods failed on this
dataset (safe).
Remove it and try again.
See error messages for
more information.
[1] 4
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running GAGE for day5"
[1] "ERROR: GAGE encountered an \n\t\t\t\t\t\t\terror Error in rungage(voom.results = voom.results, contrast = contrast, gs.annot = gs.annot, : Invalid contrasts selected.\n"
ERROR: ERROR: One of the GSE methods failed on this
dataset (gage).
Remove it and try again.
See error messages for
more information.
[1] 5
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running PADOG for day5"
[1] "ERROR: PADOG encountered an \n\t\t\t\t\t\t\terror Error in runpadog(voom.results = voom.results, contrast = contrast, gs.annot = gs.annot, : Invalid contrasts selected.\n"
ERROR: ERROR: One of the GSE methods failed on this
dataset (padog).
Remove it and try again.
See error messages for
more information.
[1] 6
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
Running PLAGE on all contrasts ... COMPLETED
ERROR: incorrect number of dimensions
[1] 7
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
Running ZSCORE on all contrasts ... COMPLETED
ERROR: incorrect number of dimensions
[1] 8
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
Running GSVA on all contrasts ... COMPLETED
ERROR: incorrect number of dimensions
[1] 9
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
Running SSGSEA on all contrasts ... COMPLETED
ERROR: incorrect number of dimensions
[1] 10
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running GLOBALTEST for day5"
[1] "ERROR: GLOBALTEST encountered an \n\t\t\t\t\t\t\terror Error in runglobaltest(voom.results = voom.results, contrast = contrast, : Invalid contrasts selected.\n"
ERROR: ERROR: One of the GSE methods failed on this
dataset (globaltest).
Remove it and try again.
See error messages for
more information.
[1] 11
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running ORA for day5"
[1] " Running ORA for day12"
[1] " Running ORA for day17"
[1] " Running ORA for day36"
Running ORA on all contrasts ... COMPLETED
ERROR: attempt to select less than one element in get1index
[1] 12
[1] "The ensemble mode was disabled."
[1] "EGSEA analysis has started"
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and c2 gene sets"
[1] " Running FRY for day5"
[1] " Running FRY for day12"
[1] " Running FRY for day17"
[1] " Running FRY for day36"
Running FRY on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c5 gene sets"
[1] " Running FRY for day5"
[1] " Running FRY for day12"
[1] " Running FRY for day17"
[1] " Running FRY for day36"
Running FRY on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and c7 gene sets"
[1] " Running FRY for day5"
[1] " Running FRY for day12"
[1] " Running FRY for day17"
[1] " Running FRY for day36"
Running FRY on all contrasts ... COMPLETED
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] " Running FRY for day5"
[1] " Running FRY for day12"
[1] " Running FRY for day17"
[1] " Running FRY for day36"
Running FRY on all contrasts ... COMPLETED
[1] "EGSEA analysis has completed"
Warning messages:
1: In mclapply(args.all, rungsva.contrast, mc.cores = num.workers) :
all scheduled cores encountered errors in user code
2: In mclapply(args.all, rungsva.contrast, mc.cores = num.workers) :
all scheduled cores encountered errors in user code
3: In mclapply(args.all, rungsva.contrast, mc.cores = num.workers) :
all scheduled cores encountered errors in user code
4: In mclapply(args.all, rungsva.contrast, mc.cores = num.workers) :
all scheduled cores encountered errors in user code
From the errors, I cannot really figure out what went wrong in each case. Besides, I could really use some advice on what criteria to use when selecting the base GSA methods to include in the egsea run.
Any suggestions?
Thanks a lot,
Aritz