Hi, I have some data from deseq2 analysis I am trying to represent in a volcano plot using EnhancedVolcano. I have a list of genes I want to label, and then a subset of these same genes that I want to highlight by changing the label color to magenta. The gene list and label color list match correctly when called, but when encorporated into the plot they are not correct. I was able to recreate the problem using the airway dataset, I included this code below.
Since it happens with multiple datasets I am assuming there is something in the order in which the label is matched with the color or something similar that I am not accounting for. Any help would be appreciated!
Here is the resulting plot; https://ibb.co/0nGfycN
library(airway)
library(magrittr)
library(org.Hs.eg.db)
library('DESeq2')
library("EnhancedVolcano")
# deseq analysis done same as in the EnhancedVolcano vignette
data('airway')
airway$dex %<>% relevel('untrt')
ens <- rownames(airway)
symbols <- mapIds(org.Hs.eg.db, keys = ens, column = c('SYMBOL'), keytype = 'ENSEMBL')
symbols <- symbols[!is.na(symbols)]
symbols <- symbols[match(rownames(airway), names(symbols))]
rownames(airway) <- symbols
keep <- !is.na(rownames(airway))
airway <- airway[keep,]
dds <- DESeqDataSet(airway, design = ~ cell + dex)
dds <- DESeq(dds, betaPrior=FALSE)
res <- results(dds, contrast = c('dex','trt','untrt'))
res <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res, type = 'normal')
# creating gene list to label and sublist to highlight
list = c("SPARCL1", "SAMHD1", "MAOA", "GPX3", "STEAP2", "NEXN", "MT2A", "ADAMTS1", "SORT1", "KLF15", "CCDC69", "ACSS1", "FSTL3", "CYTH3", "PRODH", "REV3L", "C7", "GAS7", "SLC7A14")
sublist = c("SAMHD1", "MAOA", "GPX3", "STEAP2", "MT2A", "ADAMTS1", "SORT1", "CCDC69", "ACSS1", "FSTL3", "PRODH", "REV3L", "C7", "GAS7")
# using ifelse to create a list of label colors
keyval.col = ifelse(list %in% sublist, 'magenta1', 'black')
# These genes, for example, are correct here but incorrect in the plot
list[6]
[1] "NEXN"
keyval.col[6]
[1] "black"
list[18]
[1] "GAS7"
keyval.col[18]
[1] "magenta1"
EnhancedVolcano(res, lab = rownames(res), x = 'log2FoldChange', y = 'padj', selectLab = (list), xlim = c(-5, 5), ylim = c(0, 150), xlab = bquote(~Log[2]~ 'fold change'), ylab = bquote(~-Log[10]~adjusted~italic(P)), pCutoff = 0.05, FCcutoff = 1.5, pointSize = 4.0, labSize = 6.0, labCol = keyval.col, labFace = 'bold', boxedLabels = TRUE, colAlpha = 4/5, legendPosition = 'none', drawConnectors = TRUE, widthConnectors = 1.0, colConnectors = 'black')
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.14.0 AnnotationDbi_1.56.2 magrittr_2.0.1
[4] airway_1.14.0 BiocManager_1.30.16 EnhancedVolcano_1.12.0
[7] ggrepel_0.9.1 ggplot2_3.3.5 DESeq2_1.34.0
[10] SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0
[13] matrixStats_0.61.0 GenomicRanges_1.46.0 GenomeInfoDb_1.30.0
[16] IRanges_2.28.0 S4Vectors_0.32.2 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] httr_1.4.2 maps_3.4.0 bit64_4.0.5 splines_4.1.2
[5] assertthat_0.2.1 BiocFileCache_2.2.0 blob_1.2.2 vipor_0.4.5
[9] GenomeInfoDbData_1.2.7 Rttf2pt1_1.3.9 pillar_1.6.4 RSQLite_2.2.8
[13] lattice_0.20-45 glue_1.4.2 digest_0.6.28 extrafontdb_1.0
[17] RColorBrewer_1.1-2 XVector_0.34.0 colorspace_2.0-2 Matrix_1.3-4
[21] XML_3.99-0.8 pkgconfig_2.0.3 genefilter_1.76.0 zlibbioc_1.40.0
[25] purrr_0.3.4 xtable_1.8-4 scales_1.1.1 ggrastr_0.2.3
[29] BiocParallel_1.28.0 tibble_3.1.5 annotate_1.72.0 KEGGREST_1.34.0
[33] farver_2.1.0 generics_0.1.1 ellipsis_0.3.2 cachem_1.0.6
[37] withr_2.4.2 survival_3.2-13 crayon_1.4.2 memoise_2.0.0
[41] ash_1.0-15 fansi_0.5.0 MASS_7.3-54 beeswarm_0.4.0
[45] tools_4.1.2 lifecycle_1.0.1 munsell_0.5.0 locfit_1.5-9.4
[49] DelayedArray_0.20.0 Biostrings_2.62.0 compiler_4.1.2 ggalt_0.4.0
[53] rlang_0.4.12 grid_4.1.2 RCurl_1.98-1.5 rappdirs_0.3.3
[57] labeling_0.4.2 bitops_1.0-7 proj4_1.0-10.1 gtable_0.3.0
[61] curl_4.3.2 DBI_1.1.1 R6_2.5.1 dplyr_1.0.7
[65] fastmap_1.1.0 bit_4.0.4 extrafont_0.17 utf8_1.2.2
[69] filelock_1.0.2 KernSmooth_2.23-20 ggbeeswarm_0.6.0 parallel_4.1.2
[73] Rcpp_1.0.7 vctrs_0.3.8 geneplotter_1.72.0 png_0.1-7
[77] dbplyr_2.1.1 tidyselect_1.1.1