I am having trouble getting pathview
to map refseq systematic IDs (locus_tag) to the correct gene symbol. For example, pathview
maps YLR174W to IDP1 when it should be IDP2, and YOL126C to MDH3 when it should be MDH2. Any suggestions?
> x[[1]]$plot.data.gene %>% filter(kegg.names %in% c('YLR174W', 'YOL126C'))
kegg.names labels all.mapped type x y width height log2FoldChange mol.col
39 YLR174W IDP1 YLR174W gene 718 510 46 17 1.161110 #FF0000
41 YLR174W IDP1 YLR174W gene 718 405 46 17 1.161110 #FF0000
47 YOL126C MDH3 YOL126C gene 253 349 46 17 1.898154 #FF0000
This is how I am calling pathview
mapKEGGpathway = function(name, res, pathway_id, lfc_thres, padj_thres = .05, species = 'sce'){
fltr_res = res %>%
as.data.frame() %>%
filter(abs(log2FoldChange) > lfc_thres &
padj < padj_thres) %>%
select(log2FoldChange)
pathview(
gene.data = fltr_res,
gene.idtype = 'kegg', # per the documentation
kegg.native = FALSE,
map.symbol = TRUE,
expand.node = TRUE,
pathway.id = pathway_id,
species = species,
out.suffix = paste0(name, "_", names(pathways[pathways == pathway_id]))
)
}
x = map(names(shrunken_res_lists$minus_lys),
~mapKEGGpathway(., shrunken_res_lists$minus_lys[[.]],
pathway_id = pathways$tca_cycle, lfc_thres = 1))
shrunken_res_list$minus_lys is a list of DESeq2 results tables that look like this:
> head(shrunken_res_lists$minus_lys$EDS1)
log2 fold change (MMSE): aminoAcid_HisMetLeuUra_vs_LysHisMetLeuUra vs genotypeEDS1.aminoAcidHisMetLeuUra
Wald test p-value: aminoAcid_HisMetLeuUra_vs_LysHisMetLeuUra vs genotypeEDS1.aminoAcidHisMetLeuUra
DataFrame with 6 rows and 5 columns
baseMean log2FoldChange lfcSE pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric>
Q0020 2717.37 -2.132892 1.267952 0.000346399 0.00453428
Q0045 5007.13 -0.335857 0.534820 0.158333978 0.37525617
Q0050 1229.19 -1.111804 0.877374 0.005074406 0.03406794
Q0055 2514.75 -1.081450 0.843239 0.005293675 0.03513808
Q0060 435.58 -0.987048 0.835864 0.007845219 0.04631870
Q0065 1044.41 -0.709558 0.750734 0.024276562 0.10358936
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] pathview_1.34.0 ggVennDiagram_1.2.0 here_1.0.1 pheatmap_1.0.12 DT_0.20 forcats_0.5.1
[7] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
[13] ggplot2_3.3.5 tidyverse_1.3.1 patchwork_1.1.1 gprofiler2_0.2.1 DESeq2_1.34.0 SummarizedExperiment_1.24.0
[19] Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0
[25] S4Vectors_0.32.3 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-2 rjson_0.2.21 class_7.3-20 ellipsis_0.3.2 rprojroot_2.0.2 XVector_0.34.0
[7] fs_1.5.2 proxy_0.4-26 rstudioapi_0.13 farver_2.1.0 bit64_4.0.5 AnnotationDbi_1.56.2
[13] fansi_1.0.2 lubridate_1.8.0 xml2_1.3.3 splines_4.1.2 cachem_1.0.6 geneplotter_1.72.0
[19] knitr_1.37 jsonlite_1.7.3 Rsamtools_2.10.0 broom_0.7.11 annotate_1.72.0 dbplyr_2.1.1
[25] png_0.1-7 graph_1.72.0 compiler_4.1.2 httr_1.4.2 backports_1.4.1 assertthat_0.2.1
[31] Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2 cli_3.1.1 htmltools_0.5.2 tools_4.1.2
[37] gtable_0.3.0 glue_1.6.1 GenomeInfoDbData_1.2.7 Rcpp_1.0.8 cellranger_1.1.0 vctrs_0.3.8
[43] Biostrings_2.62.0 rtracklayer_1.54.0 xfun_0.29 rvest_1.0.2 lifecycle_1.0.1 restfulr_0.0.13
[49] XML_3.99-0.8 org.Hs.eg.db_3.14.0 zlibbioc_1.40.0 scales_1.1.1 org.Sc.sgd.db_3.14.0 hms_1.1.1
[55] KEGGgraph_1.54.0 parallel_4.1.2 RColorBrewer_1.1-2 yaml_2.2.1 memoise_2.0.1 stringi_1.7.6
[61] RSQLite_2.2.9 genefilter_1.76.0 BiocIO_1.4.0 e1071_1.7-9 BiocParallel_1.28.3 rlang_0.4.12
[67] pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.14 lattice_0.20-45 sf_1.0-5 labeling_0.4.2
[73] GenomicAlignments_1.30.0 htmlwidgets_1.5.4 bit_4.0.4 tidyselect_1.1.1 magrittr_2.0.1 R6_2.5.1
[79] generics_0.1.1 DelayedArray_0.20.0 DBI_1.1.2 pillar_1.6.4 haven_2.4.3 withr_2.4.3
[85] units_0.7-2 survival_3.2-13 KEGGREST_1.34.0 RCurl_1.98-1.5 modelr_0.1.8 crayon_1.4.2
[91] KernSmooth_2.23-20 utf8_1.2.2 plotly_4.10.0 RVenn_1.1.0 tzdb_0.2.0 rmarkdown_2.11
[97] locfit_1.5-9.4 grid_4.1.2 readxl_1.3.1 data.table_1.14.2 Rgraphviz_2.38.0 blob_1.2.2
[103] classInt_0.4-3 reprex_2.0.1 digest_0.6.29 xtable_1.8-4 munsell_0.5.0 viridisLite_0.4.0
For the sake of completeness, that was the tca cycle pathway sce00020. Thank you for checking that -- I did write to kegg for what it is worth.