Entering edit mode
I have performed gene set analysis (GSA) using gsameth from missMethyl package for EPICv2 array. I noticed that the probes annotated to more than one gene gets completely excluded from the analysis. I found the reason and corrected it but was curios if it is isolated case. The problem lies in one of the subfunctions of gsameth.
In gsameth => getMappedEntrezIDs => .getFlatAnnotation
# This is the way i tested it
Anno <- getAnnotation(IlluminaHumanMethylationEPICv2anno.20a1.hg38)
flat_test <- .getFlatAnnotation(array.type = "EPIC_V2", anno = Anno)
> head(rownames(flat_test))
[1] "cg25324105_BC111" "cg25383568_TC111" "cg25623721_TC111" "cg25898577_BC11" "cg25908985_BC11" "cg25910443_TC111"
# And this is the line where the problem is located within the .getFlatAnnotation
flat <- data.frame(symbol = unlist(geneslist), group = unlist(grouplist))
This results in inaccurate transformation of list into dataframe for probes with multiple genes. Probes change from cg25324105_BC11 to cg25324105_BC111
Then I decided to change it to
flat <- data.frame(
rowname = rep(names(geneslist), lengths(geneslist)),
symbol = unlist(geneslist),
group = unlist(grouplist))
> head(flat$rowname)
[1] "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg00381604_BC11" "cg21870274_BC21"
# And in subsequent lines I changed rownames(flat) to flat$rowname
# sessionInfo( )
R Under development (unstable) (2024-12-20 r87452 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.2.1 ggplot2_3.5.1
[3] stringr_1.5.1 rtracklayer_1.67.0
[5] org.Hs.eg.db_3.20.0 AnnotationDbi_1.69.0
[7] qusage_2.41.0 missMethyl_1.41.0
[9] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
[11] DMRcatedata_2.25.0 ExperimentHub_2.15.0
[13] AnnotationHub_3.15.0 BiocFileCache_2.15.0
[15] dbplyr_2.5.0 DMRcate_3.3.1
[17] limma_3.63.2 readxl_1.4.3
[19] readr_2.1.5 dplyr_1.1.4
[21] data.table_1.16.4 IlluminaHumanMethylationEPICv2anno.20a1.hg38_1.0.0
[23] IlluminaHumanMethylationEPICv2manifest_1.0.0 minfi_1.53.1
[25] bumphunter_1.49.0 locfit_1.5-9.10
[27] iterators_1.0.14 foreach_1.5.2
[29] Biostrings_2.75.3 XVector_0.47.0
[31] SummarizedExperiment_1.37.0 Biobase_2.67.0
[33] MatrixGenerics_1.19.0 matrixStats_1.4.1
[35] GenomicRanges_1.59.1 GenomeInfoDb_1.43.2
[37] IRanges_2.41.2 S4Vectors_0.45.2
[39] BiocGenerics_0.53.3 generics_0.1.3
loaded via a namespace (and not attached):
[1] ProtGenerics_1.39.1 bitops_1.0-9 httr_1.4.7 RColorBrewer_1.1-3 tools_4.5.0
[6] doRNG_1.8.6 backports_1.5.0 R6_2.5.1 HDF5Array_1.35.2 lazyeval_0.2.2
[11] Gviz_1.51.0 rhdf5filters_1.19.0 permute_0.9-7 withr_3.0.2 prettyunits_1.2.0
[16] gridExtra_2.3 base64_2.0.2 preprocessCore_1.69.0 cli_3.6.3 labeling_0.4.3
[21] mvtnorm_1.3-2 genefilter_1.89.0 tidytable_0.11.2 askpass_1.2.1 Rsamtools_2.23.1
[26] foreign_0.8-87 siggenes_1.81.0 illuminaio_0.49.0 R.utils_2.12.3 rentrez_1.2.3
[31] dichromat_2.0-0.1 scrime_1.3.5 BSgenome_1.75.0 rstudioapi_0.17.1 RSQLite_2.3.9
[36] BiocIO_1.17.1 gtools_3.9.5 Matrix_1.7-1 interp_1.1-6 abind_1.4-8
[41] R.methodsS3_1.8.2 lifecycle_1.0.4 yaml_2.3.10 edgeR_4.5.1 rhdf5_2.51.1
[46] SparseArray_1.7.2 grid_4.5.0 blob_1.2.4 crayon_1.5.3 lattice_0.22-6
[51] beachmat_2.23.5 GenomicFeatures_1.59.1 annotate_1.85.0 KEGGREST_1.47.0 pillar_1.10.0
[56] knitr_1.49 beanplot_1.3.1 rjson_0.2.23 fftw_1.0-9 estimability_1.5.1
[61] codetools_0.2-20 glue_1.8.0 remotes_2.5.0 vctrs_0.6.5 png_0.1-8
[66] cellranger_1.1.0 gtable_0.3.6 cachem_1.1.0 xfun_0.49 S4Arrays_1.7.1
[71] mime_0.12 survival_3.8-3 statmod_1.5.0 nlme_3.1-166 bit64_4.5.2
[76] bsseq_1.43.1 progress_1.2.3 filelock_1.0.3 nor1mix_1.3-3 rpart_4.1.23
[81] colorspace_2.1-1 DBI_1.2.3 Hmisc_5.2-1 nnet_7.3-19 tidyselect_1.2.1
[86] emmeans_1.10.6 bit_4.5.0.1 compiler_4.5.0 curl_6.0.1 httr2_1.0.7
[91] htmlTable_2.4.3 BiasedUrn_2.0.12 xml2_1.3.6 DelayedArray_0.33.3 checkmate_2.3.2
[96] scales_1.3.0 quadprog_1.5-8 rappdirs_0.3.3 digest_0.6.37 rmarkdown_2.29
[101] GEOquery_2.75.0 htmltools_0.5.8.1 pkgconfig_2.0.3 jpeg_0.1-10 base64enc_0.1-3
[106] sparseMatrixStats_1.19.0 fastmap_1.2.0 ensembldb_2.31.0 rlang_1.1.4 htmlwidgets_1.6.4
[111] UCSC.utils_1.3.0 DelayedMatrixStats_1.29.0 farver_2.1.2 jsonlite_1.8.9 BiocParallel_1.41.0
[116] mclust_6.1.1 R.oo_1.27.0 VariantAnnotation_1.53.0 RCurl_1.98-1.16 magrittr_2.0.3
[121] Formula_1.2-5 GenomeInfoDbData_1.2.13 Rhdf5lib_1.29.0 munsell_0.5.1 Rcpp_1.0.13-1
[126] stringi_1.8.4 zlibbioc_1.53.0 MASS_7.3-61 plyr_1.8.9 deldir_2.0-4
[131] splines_4.5.0 multtest_2.63.0 hms_1.1.3 rngtools_1.5.2 biomaRt_2.63.0
[136] BiocVersion_3.21.1 XML_3.99-0.17 evaluate_1.0.1 latticeExtra_0.6-30 biovizBase_1.55.0
[141] BiocManager_1.30.25 tzdb_0.4.0 tidyr_1.3.1 openssl_2.3.0 purrr_1.0.2
[146] reshape_0.8.9 xtable_1.8-4 restfulr_0.0.15 AnnotationFilter_1.31.0 memoise_2.0.1
[151] GenomicAlignments_1.43.0 cluster_2.1.8