Hi Everyone!
Im currently working on a basic DMP/DMR analysis using EPICV2 array data, and was running into an interesting issue in regards to specifying my model design. I am getting VERY different results between models with and without an intercept. Since my variable of interest is a factor, the results should be the same, right?
My variable of interest is a factor with two levels ("AltSHP_LA" and "AltSHP_T_HA") and I have several covariates that are both factors and continuous (ex. Sex, PC1 (Blood Cell types), and Age.
Ive read some similar questions that have been previously asked and answered, but I cant seem to find a solution from them that solve this exact issue. Any help would be greatly appreciated!
# Define the factor of interest and covariates
Alt <- factor(targets_Sherpa$Group_Code)
Sex <- factor(targets_Sherpa$Sex)
Age <- as.numeric(targets_Sherpa$Age)
PC1 <- as.numeric(targets_Sherpa$PC1)
# Define Intercept Model
design_intercept <- model.matrix(~Alt+Sex+Age+PC1, data=targets_Sherpa)
colnames(design_intercept) <- c("AltSHP_LA","AltSHP_T_HA", "Sex", "Age", "PC1")
contMatrix_intercept <- makeContrasts(AltSHP_LA-AltSHP_T_HA,
levels=design_intercept)
# Define no-Intercept Model
design_Nointercept <- model.matrix(~0+Alt+Sex+Age+PC1, data=targets_Sherpa)
contMatrix_Nointercept <- makeContrasts(AltSHP_LA-AltSHP_T_HA,
levels=design_Nointercept)
# Function for retreiving DMPs to ensure Im running things consistently between models:
getDMP <- function(mSet, design, contMatrix, coef_num){
mVals2 <- getM(mSet)
fit <- lmFit(mVals2, design)
fit <- contrasts.fit(fit, contMatrix)
fit <- eBayes(fit, trend=TRUE)
topTable(fit,coef=coef_num)
DMPs <- topTable(fit,coef=coef_num)
DMPs}
# Intercept Model Results:
Intercept <- getDMP(mSetSqFlt_Sherpa2, design_intercept, contMatrix_intercept, 1)
logFC AveExpr t P.Value. adj.P.Val
cg21393171_TC21 6.694837 1.858226 36.37520 6.392776e-25 5.770445e-19
cg11049634_BC11 8.315347 2.423556 35.38914 1.347332e-24 6.080854e-19
cg08560117_BC21 -7.994720 -1.825273 -34.69716 2.300907e-24 6.923052e-19
cg27604818_BC21 -7.719519 -1.969742 -32.76409 1.085070e-23 2.448599e-18
cg26207017_TC21 -7.219652 -2.106884 -32.07883 1.920206e-23 3.461483e-18
# No Intercept Model Results:
NoIntercept <- getDMP(mSetSqFlt_Sherpa2, design_Nointercept, contMatrix_Nointercept, 1)
logFC AveExpr t P.Value. adj.P.Val
cg02971555_BC21 -0.5815881 2.6954319 -5.921211 2.361404e-06 0.9758965
cg21532288_TC21 0.3920665 3.2796548 5.634846 5.100760e-06 0.9758965
cg03311185_BC21 0.3433009 2.9640623 5.602717 8.200960e-06 0.9758965
cg04490945_TC21 0.4260654 0.5504043 5.459177 8.707104e-06 0.9758965
As you can see, the intercept and non-intercept models really change results drastically. Any idea whats going on?
Thanks in advance, any help would be appreciated!
sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /u/local/compilers/intel/oneapi/2022.1.1/mkl/2022.0.1/lib/intel64/libmkl_gf_lp64.so.2; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] ewastools_1.7.2
[2] IlluminaHumanMethylationEPICv2manifest_0.99.2
[3] IlluminaHumanMethylationEPICv2anno.20a1.hg38_0.99.1
[4] methylCC_1.16.0
[5] FlowSorted.Blood.450k_1.40.0
[6] remotes_2.5.0
[7] RColorBrewer_1.1-3
[8] DMRcatedata_2.20.3
[9] ExperimentHub_2.10.0
[10] AnnotationHub_3.10.1
[11] BiocFileCache_2.10.2
[12] dbplyr_2.5.0
[13] mCSEA_1.22.0
[14] Homo.sapiens_1.3.1
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[16] org.Hs.eg.db_3.18.0
[17] GO.db_3.18.0
[18] OrganismDbi_1.44.0
[19] GenomicFeatures_1.54.4
[20] AnnotationDbi_1.64.1
[21] mCSEAdata_1.22.0
[22] DMRcate_2.99.6
[23] Gviz_1.46.1
[24] minfiData_0.48.0
[25] IlluminaHumanMethylation450kmanifest_0.4.0
[26] missMethyl_1.36.0
[27] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[28] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
[29] minfi_1.48.0
[30] bumphunter_1.44.0
[31] locfit_1.5-9.9
[32] iterators_1.0.14
[33] foreach_1.5.2
[34] Biostrings_2.70.3
[35] XVector_0.42.0
[36] SummarizedExperiment_1.32.0
[37] Biobase_2.62.0
[38] MatrixGenerics_1.14.0
[39] matrixStats_1.2.0
[40] GenomicRanges_1.54.1
[41] GenomeInfoDb_1.38.8
[42] IRanges_2.36.0
[43] S4Vectors_0.40.2
[44] BiocGenerics_0.48.1
[45] limma_3.58.1
[46] lubridate_1.9.3
[47] forcats_1.0.0
[48] stringr_1.5.1
[49] dplyr_1.1.4
[50] purrr_1.0.2
[51] readr_2.1.5
[52] tidyr_1.3.1
[53] tibble_3.2.1
[54] ggplot2_3.5.1
[55] tidyverse_2.0.0
[56] BiocManager_1.30.22
loaded via a namespace (and not attached):
[1] ProtGenerics_1.34.0 bitops_1.0-7
[3] httr_1.4.7 tools_4.3.0
[5] doRNG_1.8.6 backports_1.4.1
[7] utf8_1.2.4 R6_2.5.1
[9] HDF5Array_1.30.1 lazyeval_0.2.2
[11] rhdf5filters_1.14.1 permute_0.9-7
[13] withr_3.0.0 prettyunits_1.2.0
[15] gridExtra_2.3 base64_2.0.1
[17] preprocessCore_1.64.0 cli_3.6.2
[19] genefilter_1.84.0 askpass_1.2.0
[21] Rsamtools_2.18.0 foreign_0.8-84
[23] siggenes_1.76.0 illuminaio_0.44.0
[25] R.utils_2.12.3 dichromat_2.0-0.1
[27] scrime_1.3.5 BSgenome_1.70.2
[29] readxl_1.4.3 rstudioapi_0.16.0
[31] RSQLite_2.3.6 generics_0.1.3
[33] BiocIO_1.12.0 gtools_3.9.5
[35] Matrix_1.6-5 interp_1.1-6
[37] fansi_1.0.6 abind_1.4-5
[39] R.methodsS3_1.8.2 lifecycle_1.0.4
[41] yaml_2.3.8 edgeR_4.0.16
[43] rhdf5_2.46.1 SparseArray_1.2.4
[45] blob_1.2.4 promises_1.2.1
[47] crayon_1.5.2 lattice_0.21-8
[49] annotate_1.80.0 KEGGREST_1.42.0
[51] pillar_1.9.0 knitr_1.45
[53] beanplot_1.3.1 rjson_0.2.21
[55] codetools_0.2-19 glue_1.7.0
[57] data.table_1.15.4 vctrs_0.6.5
[59] png_0.1-8 cellranger_1.1.0
[61] gtable_0.3.4 cachem_1.0.8
[63] xfun_0.43 S4Arrays_1.2.1
[65] mime_0.12 survival_3.5-5
[67] statmod_1.5.0 interactiveDisplayBase_1.40.0
[69] ellipsis_0.3.2 nlme_3.1-162
[71] bit64_4.0.5 bsseq_1.38.0
[73] progress_1.2.3 filelock_1.0.3
[75] nor1mix_1.3-3 rpart_4.1.19
[77] colorspace_2.1-0 DBI_1.2.2
[79] Hmisc_5.1-2 nnet_7.3-18
[81] tidyselect_1.2.1 bit_4.0.5
[83] compiler_4.3.0 curl_5.2.1
[85] graph_1.80.0 htmlTable_2.4.2
[87] xml2_1.3.6 DelayedArray_0.28.0
[89] rtracklayer_1.62.0 checkmate_2.3.1
[91] scales_1.3.0 quadprog_1.5-8
[93] RBGL_1.78.0 rappdirs_0.3.3
[95] digest_0.6.35 rmarkdown_2.26
[97] GEOquery_2.70.0 htmltools_0.5.7
[99] pkgconfig_2.0.3 jpeg_0.1-10
[101] base64enc_0.1-3 sparseMatrixStats_1.14.0
[103] fastmap_1.1.1 ensembldb_2.26.0
[105] rlang_1.1.3 htmlwidgets_1.6.4
[107] shiny_1.8.0 DelayedMatrixStats_1.24.0
[109] BiocParallel_1.36.0 mclust_6.1.1
[111] R.oo_1.26.0 VariantAnnotation_1.48.1
[113] RCurl_1.98-1.14 magrittr_2.0.3
[115] Formula_1.2-5 GenomeInfoDbData_1.2.11
[117] Rhdf5lib_1.24.2 munsell_0.5.0
[119] Rcpp_1.0.12 stringi_1.8.3
[121] zlibbioc_1.48.2 MASS_7.3-58.4
[123] plyr_1.8.9 deldir_2.0-2
[125] splines_4.3.0 multtest_2.58.0
[127] hms_1.1.3 rngtools_1.5.2
[129] biomaRt_2.58.2 BiocVersion_3.18.1
[131] XML_3.99-0.16.1 evaluate_0.23
[133] latticeExtra_0.6-30 biovizBase_1.50.0
[135] tzdb_0.4.0 httpuv_1.6.14
[137] openssl_2.1.1 reshape_0.8.9
[139] xtable_1.8-4 restfulr_0.0.15
[141] AnnotationFilter_1.26.0 later_1.3.2
[143] plyranges_1.22.0 memoise_2.0.1
[145] GenomicAlignments_1.38.2 cluster_2.1.4
[147] timechange_0.3.0