My questions are as follows

Question

Significant Probes and Genes differ in Probe and Region level Differential Methylation Analyses

0

Entering edit mode

jamiecatalano2 • 0

@49e19fb2

Last seen 1 day ago

United States

I have a question regarding underlying theory for significant probes and genes identified by a.) probe level differential methylation analysis using limma and b.) region level differential methylation analysis with regions assigned by DMRcate.

DMP analysis resulted in 805 probes and 95 genes that meet FDR significance.
DMR analysis resulted in 803 probes and 163 genes that meet FDR significance.

My questions are as follows

Why is there only around 12 percent overlap between significant CpGs identified at probe and region level
(Likely will be answered by Q1) Why are there only 31 significant genes overlapping at probe and region level

Approach

For reference, my approach closely follows what is outlined by Maksimovic in (https://doi.org/10.12688/f1000research.8839.3).

For following analysis, I test 814,549 EPICv2 array probes that passed my ENmix QC pipeline. I have 124 samples (93 cases, phen.1 and 31 controls, phen.0).

Inputs for differential methylation analysis include

EPICv2 Annotation

annoEPICv2 <- getAnnotation(IlluminaHumanMethylationEPICv2anno.20a1.hg38)

M Value matrix

mval <- logit2(beta)

mval <- as.matrix(mval)

mval <- rmSNPandCH(mval, dist = 2, mafcut = 0.05, and = TRUE, rmcrosshyb = TRUE, rmXY = TRUE)

Phenotypes restricted to samples used in analysis with samples ordered to what is in M Matrix

keep <- intersect(colnames(mval), phen$id)

mval <- mval[,colnames(mval) %in% keep]

phen <- phen[phen$id %in% keep,]

phen <- data.frame(phen)

rownames(phen) <- phen$id

phen <- phen[match(colnames(mval), rownames(phen)), ]

Design and Contrast matrix for both Probe and Region level analyses

sex_cat <- as.factor(phen$Sex)

phen_cat <- as.factor(phen$Pheno)

design <-model.matrix(~0+phen_cat+sex_cat+Age+CD8T+CD4T+NK+Bcell+Mono, data=phen)

colnames(design) <- c('phen.0',  'phen.1', 'sex_cat', 'Age', 'CD8T', 'CD4T', 'NK', 'Bcell', 'Mono')

cont.matrix <- makeContrasts(phen.effect=phen.1-phen.0, levels=design)

Probe level Differential Methylation Analysis (Result is 805 probes and 95 genes FDR significant)

fit <- lmFit(mval, design)
fit2 <- contrasts.fit(fit,cont.matrix)
fit2 <- eBayes(fit2)
summary(decideTests(fit2)) 

annoEPICv2Sub <- annoEPICv2[match(rownames(mval), annoEPICv2$Name),
                      c(1:4,12:19,24:ncol(annoEPICv2))]

annoEPICv2Sub <- as.data.frame(annoEPICv2Sub)

DMPs <- limma::topTable(fit2, num=Inf, coef=1, adjust='fdr', genelist=annoEPICv2Sub, confint=TRUE) 

DMPs$sig_flag <- ifelse(DMPs$adj.P.Val <=0.05,'pass', 'fail')
table(DMPs$sig_flag)

Region level Differential Methylation Analysis (Result is 803 probes, 156 regions, and 163 genes with FDR significance).

Annotation of the methylation dataset (mval) using EPICv2 data type and differential analysis design

myAnnotation <- cpg.annotate(
  datatype = 'array',  # Type of data (array format)

  object = mval,  # Methylation values matrix

  what = 'M',  # Methylation data (mean methylation)

  arraytype = 'EPICv2',  # Specifies that the dataset is from the EPICv2 array

  epicv2Filter = 'mean',  # Filtering method for EPICv2 probes

  epicv2Remap = TRUE,  # Reassign cross hybridizing probes to correct locations

  analysis.type = 'differential',  # Type of analysis: differential methylation

  design = design,  # Design matrix from differential methylation analysis (DMPs)

  contrasts = TRUE,  # Use contrasts for analysis

  cont.matrix = cont.matrix,  # Contrast matrix

  coef = 'phen.effect',  # Coefficient for the phenotype effect

  fdr = 0.05,  # Set false discovery rate (FDR) threshold to 0.05

  annotation = 'ilmn20a1.hg38'  # Genome annotation (hg38)
)

EPICv2 specified. Loading manifest... snapshotDate() 2024 10 28 loading from cache Your contrast returned 803 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

Identify DMRs using the parameters defined above

DMRs <- dmrcate(

  myAnnotation,  # Annotated data from the previous step

  lambda = 1000,  # Smoothing window size of 1000 base pairs

  C = 2,  # Scaling factor for kernel smoothing

  pcutoff = 'fdr',  # Significance cutoff: false discovery rate (FDR)

  consec = FALSE,  # Dont restrict to consecutive probes (include any probes within the window)

  conseclambda = 10,  # Lambda for consecutive probes, irrelevant when consec=FALSE

  betacutoff = NULL,  # No specific methylation beta cutoff (default behavior)

  min.cpgs = 1  # Minimum number of CpGs per region to define a DMR (can also be set to 2)

)

### Extract ranges (genomic locations) of significant DMRs from the results of dmrcate
results.ranges <- DMRcate::extractRanges(DMRs, genome = 'hg38')

Get CpG probes for Differentially Methylated Regions. Resource extract CpGs from DMRcate results

### Load the necessary package and data for IlluminaHumanMethylationEPICv2anno.20a1.hg38 annotations
locs <- IlluminaHumanMethylationEPICv2anno.20a1.hg38::Locations
### Create GRanges object from the chr and pos columns in locs.
locs.ranges <- GRanges(locs$chr, IRanges(locs$pos, locs$pos))
names(locs.ranges) <- rownames(locs)
### Use sapply to apply a function over the range of results.ranges to get overlapping CpG sites for each range

allDMRcpgs <- sapply(1:length(results.ranges), function (x) {
  ### Get the names of the CpGs that overlap with the current range in results.ranges
  overlap_names <- names(subsetByOverlaps(locs.ranges, results.ranges[x]))
  ### Return the CpG names for the current range
  return(overlap_names)
})

### Flatten the list of CpGs into a single string per DMR
allDMRcpgs <- sapply(allDMRcpgs, function(cpgs) paste(cpgs, collapse = ','))

### DMR results as a table
DMRdata <- data.frame(
  Overlapping_genes = results.ranges@elementMetadata@listData[['overlapping.genes']],

  Coordinate = DMRs@coord,  # Coordinates of the DMRs

  Num_CPGs = results.ranges@elementMetadata@listData[['no.cpgs']],

  Min_Smoothed_FDR = results.ranges@elementMetadata@listData[['min_smoothed_fdr']],

  Stouffer = results.ranges@elementMetadata@listData[['Stouffer']],

  HMFDR = results.ranges@elementMetadata@listData[['HMFDR']],

  Fisher = results.ranges@elementMetadata@listData[['Fisher']],

  Max_Diff = results.ranges@elementMetadata@listData[['maxdiff']],

  Mean_Diff = results.ranges@elementMetadata@listData[['meandiff']],

  Overlapping_CPGs = allDMRcpgs)

Session Info

sessionInfo()

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Red Hat Enterprise Linux 9.1 (Plow)

Matrix products: default
BLAS:   /cm/shared/R/4.4.0/lib64/R/lib/libRblas.so 
LAPACK: /cm/shared/R/4.4.0/lib64/R/lib/libRlapack.so  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DMRcatedata_2.24.0                                  ENmix_1.42.0                                       
 [3] doParallel_1.0.17                                   methylGSA_1.24.0                                   
 [5] jsonlite_1.9.0                                      missMethyl_1.40.0                                  
 [7] IlluminaHumanMethylationEPICv2anno.20a1.hg38_1.0.0  IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
 [9] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1  qqman_0.1.9                                        
[11] minfi_1.52.1                                        bumphunter_1.48.0                                  
[13] locfit_1.5-9.11                                     iterators_1.0.14                                   
[15] foreach_1.5.2                                       Biostrings_2.74.1                                  
[17] XVector_0.46.0                                      SummarizedExperiment_1.36.0                        
[19] MatrixGenerics_1.18.1                               matrixStats_1.5.0                                  
[21] RColorBrewer_1.1-3                                  Gviz_1.50.0                                        
[23] patchwork_1.3.0                                     ggrepel_0.9.6                                      
[25] data.table_1.17.0                                   plyr_1.8.9                                         
[27] DMRcate_3.2.1                                       ggthemes_5.1.0                                     
[29] GEOquery_2.74.0                                     Biobase_2.66.0                                     
[31] limma_3.62.2                                        lubridate_1.9.4                                    
[33] forcats_1.0.0                                       stringr_1.5.1                                      
[35] dplyr_1.1.4                                         purrr_1.0.4                                        
[37] readr_2.1.5                                         tidyr_1.3.1                                        
[39] tibble_3.2.1                                        ggplot2_3.5.1                                      
[41] tidyverse_2.0.0                                     sesame_1.24.0                                      
[43] sesameData_1.24.0                                   ExperimentHub_2.14.0                               
[45] AnnotationHub_3.14.0                                BiocFileCache_2.14.0                               
[47] dbplyr_2.5.0                                        GenomicRanges_1.58.0                               
[49] GenomeInfoDb_1.42.3                                 IRanges_2.40.1                                     
[51] S4Vectors_0.44.0                                    BiocGenerics_0.52.0

limma probe DMRcate GenomicRanges DifferentialMethylation • 74 views

ADD COMMENT • link 1 day ago jamiecatalano2 • 0