CombineArrays for EPIC and EPIC V2
1
0
Entering edit mode
Kim • 0
@27710395
Last seen 7 months ago
United States

Hi all, Im wondering if its possible to meaningfully combine data from EPIC and EPICV2 arrays. I have a MSet from each array, however, when I use combineArrays I end up with 0 probes.

When I compared the two arrays for shared probes by removing the last part of the EPICV2 probe IDs ( ex. cg25324105_BC11 -> cg25324105), I figure that I should have about 726,890 shared probes. Is this issue arising due to the different probe names between the two arrays? Or maybe because of the different annotations between the two, with my EPICV1 annotated with ilm10b4.hg19, and my EPICV2 with 20a1.hg38?

I have ~50 samples that have been run on both arrays, and if these pairs cluster closely post-join and norm I hope to combine data from these arrays for an EWAS

Any help or insight in moving forward would be greatly appreciated!

My combine arrays output:

> MSet3 <- combineArrays(MSet_V2, MSet_V1,
+                   outType = "IlluminaHumanMethylationEPIC",
+                   verbose = TRUE)
[convertArray] Casting as IlluminaHumanMethylationEPIC

> MSet3
class: MethylSet 
dim: 0 337 
metadata(0):
assays(2): Meth Unmeth
rownames(0):
rowData names(0):
colnames(337): 207925050049_R07C01 207882990112_R05C01 ...
  200526590027_R07C01 200526590047_R07C01
colData names(120): Sample_Name Sample_Plate ... gPC5 ArrayTypes
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b4.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.44.0
  Manifest version: 0.99.1

My MSets :

MSet_V1
class: MethylSet 
dim: 866238 113 
metadata(0):
assays(2): Meth Unmeth
rownames(866238): cg18478105 cg09835024 ... cg10633746 cg12623625
rowData names(0):
colnames(113): 201364900064_R04C01 201364910121_R08C01 ...
  200526590027_R07C01 200526590047_R07C01
colData names(59): Sample_Name barcode ... Basename filenames
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b4.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.44.0
  Manifest version: 0.3.0


MSet_V2
class: MethylSet 
dim: 936990 224 
metadata(0):
assays(2): Meth Unmeth
rownames(936990): cg25324105_BC11 cg25383568_TC11 ...
  ch.12.78471492F_BC21 ch.21.43742285F_BC21
rowData names(0):
colnames(224): 207925050049_R07C01 207882990112_R05C01 ...
  207882990011_R08C01 207865030077_R04C01
colData names(78): Sample_Name Sample_Plate ... Basename filenames
Annotation
  array: IlluminaHumanMethylationEPICv2
  annotation: 20a1.hg38
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.44.0
  Manifest version: 0.99.1

Obligatory Session Info:

> sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /u/local/compilers/intel/oneapi/2022.1.1/mkl/2022.0.1/lib/intel64/libmkl_gf_lp64.so.2;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] IlluminaHumanMethylationEPICv2manifest_0.99.2      
 [2] IlluminaHumanMethylationEPICmanifest_0.3.0         
 [3] IlluminaHumanMethylationEPICv2anno.20a1.hg38_0.99.1
 [4] methylCC_1.16.0                                    
 [5] FlowSorted.Blood.450k_1.40.0                       
 [6] remotes_2.5.0                                      
 [7] RColorBrewer_1.1-3                                 
 [8] DMRcatedata_2.20.3                                 
 [9] ExperimentHub_2.10.0                               
[10] AnnotationHub_3.10.0                               
[11] BiocFileCache_2.10.2                               
[12] dbplyr_2.5.0                                       
[13] mCSEA_1.22.0                                       
[14] Homo.sapiens_1.3.1                                 
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            
[16] org.Hs.eg.db_3.18.0                                
[17] GO.db_3.18.0                                       
[18] OrganismDbi_1.44.0                                 
[19] GenomicFeatures_1.54.4                             
[20] AnnotationDbi_1.64.1                               
[21] mCSEAdata_1.22.0                                   
[22] DMRcate_2.16.1                                     
[23] Gviz_1.46.1                                        
[24] minfiData_0.48.0                                   
[25] IlluminaHumanMethylation450kmanifest_0.4.0         
[26] missMethyl_1.36.0                                  
[27] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[28] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
[29] minfi_1.48.0                                       
[30] bumphunter_1.44.0                                  
[31] locfit_1.5-9.9                                     
[32] iterators_1.0.14                                   
[33] foreach_1.5.2                                      
[34] Biostrings_2.70.3                                  
[35] XVector_0.42.0                                     
[36] SummarizedExperiment_1.32.0                        
[37] Biobase_2.62.0                                     
[38] MatrixGenerics_1.14.0                              
[39] matrixStats_1.2.0                                  
[40] GenomicRanges_1.54.1                               
[41] GenomeInfoDb_1.38.8                                
[42] IRanges_2.36.0                                     
[43] S4Vectors_0.40.2                                   
[44] BiocGenerics_0.48.1                                
[45] limma_3.58.1                                       
[46] lubridate_1.9.3                                    
[47] forcats_1.0.0                                      
[48] stringr_1.5.1                                      
[49] dplyr_1.1.4                                        
[50] purrr_1.0.2                                        
[51] readr_2.1.5                                        
[52] tidyr_1.3.1                                        
[53] tibble_3.2.1                                       
[54] ggplot2_3.5.0                                      
[55] tidyverse_2.0.0                                    
[56] BiocManager_1.30.22                                

loaded via a namespace (and not attached):
  [1] ProtGenerics_1.34.0           bitops_1.0-7                 
  [3] httr_1.4.7                    tools_4.3.0                  
  [5] doRNG_1.8.6                   backports_1.4.1              
  [7] utf8_1.2.4                    R6_2.5.1                     
  [9] HDF5Array_1.30.1              lazyeval_0.2.2               
 [11] rhdf5filters_1.14.1           permute_0.9-7                
 [13] withr_3.0.0                   prettyunits_1.2.0            
 [15] gridExtra_2.3                 base64_2.0.1                 
 [17] preprocessCore_1.64.0         cli_3.6.2                    
 [19] genefilter_1.84.0             askpass_1.2.0                
 [21] Rsamtools_2.18.0              foreign_0.8-84               
 [23] siggenes_1.76.0               illuminaio_0.44.0            
 [25] R.utils_2.12.3                dichromat_2.0-0.1            
 [27] scrime_1.3.5                  BSgenome_1.70.2              
 [29] readxl_1.4.3                  rstudioapi_0.16.0            
 [31] RSQLite_2.3.6                 generics_0.1.3               
 [33] BiocIO_1.12.0                 gtools_3.9.5                 
 [35] Matrix_1.6-5                  interp_1.1-6                 
 [37] fansi_1.0.6                   abind_1.4-5                  
 [39] R.methodsS3_1.8.2             lifecycle_1.0.4              
 [41] yaml_2.3.8                    edgeR_4.0.16                 
 [43] rhdf5_2.46.1                  SparseArray_1.2.4            
 [45] blob_1.2.4                    promises_1.2.1               
 [47] crayon_1.5.2                  lattice_0.21-8               
 [49] annotate_1.80.0               KEGGREST_1.42.0              
 [51] pillar_1.9.0                  knitr_1.45                   
 [53] beanplot_1.3.1                rjson_0.2.21                 
 [55] codetools_0.2-19              glue_1.7.0                   
 [57] data.table_1.15.4             vctrs_0.6.5                  
 [59] png_0.1-8                     cellranger_1.1.0             
 [61] gtable_0.3.4                  cachem_1.0.8                 
 [63] xfun_0.43                     S4Arrays_1.2.1               
 [65] mime_0.12                     survival_3.5-5               
 [67] statmod_1.5.0                 interactiveDisplayBase_1.40.0
 [69] ellipsis_0.3.2                nlme_3.1-162                 
 [71] bit64_4.0.5                   bsseq_1.38.0                 
 [73] progress_1.2.3                filelock_1.0.3               
 [75] nor1mix_1.3-2                 rpart_4.1.19                 
 [77] colorspace_2.1-0              DBI_1.2.2                    
 [79] Hmisc_5.1-2                   nnet_7.3-18                  
 [81] tidyselect_1.2.1              bit_4.0.5                    
 [83] compiler_4.3.0                curl_5.2.1                   
 [85] graph_1.80.0                  htmlTable_2.4.2              
 [87] xml2_1.3.6                    DelayedArray_0.28.0          
 [89] rtracklayer_1.62.0            checkmate_2.3.1              
 [91] scales_1.3.0                  quadprog_1.5-8               
 [93] RBGL_1.78.0                   rappdirs_0.3.3               
 [95] digest_0.6.35                 rmarkdown_2.26               
 [97] GEOquery_2.70.0               htmltools_0.5.7              
 [99] pkgconfig_2.0.3               jpeg_0.1-10                  
[101] base64enc_0.1-3               sparseMatrixStats_1.14.0     
[103] fastmap_1.1.1                 ensembldb_2.26.0             
[105] rlang_1.1.3                   htmlwidgets_1.6.4            
[107] shiny_1.8.0                   DelayedMatrixStats_1.24.0    
[109] BiocParallel_1.36.0           mclust_6.1                   
[111] R.oo_1.26.0                   VariantAnnotation_1.48.1     
[113] RCurl_1.98-1.14               magrittr_2.0.3               
[115] Formula_1.2-5                 GenomeInfoDbData_1.2.11      
[117] Rhdf5lib_1.24.2               munsell_0.5.0                
[119] Rcpp_1.0.12                   stringi_1.8.3                
[121] zlibbioc_1.48.2               MASS_7.3-58.4                
[123] plyr_1.8.9                    deldir_2.0-2                 
[125] splines_4.3.0                 multtest_2.58.0              
[127] hms_1.1.3                     rngtools_1.5.2               
[129] biomaRt_2.58.2                BiocVersion_3.18.1           
[131] XML_3.99-0.16.1               evaluate_0.23                
[133] latticeExtra_0.6-30           biovizBase_1.50.0            
[135] tzdb_0.4.0                    httpuv_1.6.14                
[137] openssl_2.1.1                 reshape_0.8.9                
[139] xtable_1.8-4                  restfulr_0.0.15              
[141] AnnotationFilter_1.26.0       later_1.3.2                  
[143] plyranges_1.22.0              memoise_2.0.1                
[145] GenomicAlignments_1.38.2      cluster_2.1.4                
[147] timechange_0.3.0
EPICV1 Methylation combineArrays minifi EPICv2manifest • 1.8k views
ADD COMMENT
1
Entering edit mode
Tim Peters ▴ 200
@tim-peters-7579
Last seen 17 days ago
Australia

Hi Kim,

I don't maintain the minfi package, so I can't speak to the usage of combineArrays(), but you can infer the mapping of EPICv1 probes to EPICv2 via the EPICv2manifest package with its accompanying AnnotationHub object (currently in devel in Bioconductor, but will be incorporated into the main release on May 1st). A complicating factor is that not all EPICv1 probes match to EPICv2 in a 1-to-1 manner. For example, many EPICv2 probes are "replicates" that interrogate the same CpG locus, but may have different probe sequences. Hence there are multiple ways to comport your EPICv1 data to EPICv2, depending on which strategy you feel is the most appropriate. The EPICv2manifest data object has three columns that can help with this:

EPICv1probeID: Probe ID mapping, with the 5-character EPICv2 suffix removed

EPICv1seqmatch: Sequence identity mapping. EPICv1 probes with identical sequence.

EPICv1locmatch: CpG coordinate mapping. EPICv1 probes that map to the same reference CpG locus in hg38.

These three categories overlap heavily, but not identically. For any minfi-centric devs out there (@Kasper?), an updated version of combineArrays() potentially incorporating this information in the annotation merge may be helpful.

Cheers, Tim

ADD COMMENT
0
Entering edit mode

Thank you Tim, this is a great help in getting me started!

ADD REPLY
0
Entering edit mode

Hi, I am stuck in the same point, would you advise, how did you manage?

ADD REPLY

Login before adding your answer.

Traffic: 356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6