Hello,
I have a question about the results of the drmseq package. According to the guide (https://bioconductor.org/packages/devel/bioc/vignettes/dmrseq/inst/doc/dmrseq.html), a region is hypo or hypermethylated according to the sign of its corresponding test statistic and the alphabetical order of the covariate of interest. I reproduced the example results of the guide with the following code:
library(dmrseq) library(bsseq) bs <- BS.chr21 testCovariate <- "CellType" regions <- dmrseq(bs=bs[240001:260000,], cutoff = 0.05, testCovariate=testCovariate) sigRegions <- regions[regions$qval < 0.05,]
​However, the direction of the effect does not seem to be in line with the methylation levels calculated per region per sample. If one significant region is taken as an example:
> sigRegions[1,] GRanges object with 1 range and 7 metadata columns: seqnames ranges strand | L area <Rle> <IRanges> <Rle> | <integer> <numeric> 171 chr21 43605625-43606688 * | 24 12.7309412794983 beta stat pval qval <numeric> <numeric> <numeric> <numeric> 171 -1.2434551058172 -18.5650464920496 0.00307692307692308 0.025982905982906 index <IRanges> 171 845-868
The test statistic is negative so the condition that comes first in alphabetical order is hypomethylated compared to the other condition. The conditions are "h1" and "imr90" so h1 is hypomethylated compared to imr90.
> pData(bs) DataFrame with 4 rows and 2 columns CellType Rep <character> <character> imr90.r1 imr90 replicate1 imr90.r2 imr90 replicate2 h1.r1 h1 replicate1 h1.r2 h1 replicate2
Yet, when the methylation estimates are calculated, the h1 samples show a higher methylation estimate than the imr90 samples, indicating hypermethylation.
> getMeth(bs,regions = sigRegions,what = "perRegion",type = "raw")[1,] <4> DelayedArray object of type "double": imr90.r1 imr90.r2 h1.r1 h1.r2 0.3817489 0.4050773 0.9252300 0.8773829
Is there something i misunderstood about these apparent conflicting results?
regards,
Bruno
sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /home/bsverstr/R-3.5.0/lib/libRblas.so LAPACK: /home/bsverstr/R-3.5.0/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] dmrseq_1.0.0 bsseq_1.16.0 [3] SummarizedExperiment_1.10.0 DelayedArray_0.6.0 [5] BiocParallel_1.14.0 matrixStats_0.53.1 [7] Biobase_2.40.0 GenomicRanges_1.32.0 [9] GenomeInfoDb_1.16.0 IRanges_2.14.1 [11] S4Vectors_0.18.1 BiocGenerics_0.26.0 loaded via a namespace (and not attached): [1] nlme_3.1-137 bitops_1.0-6 [3] bit64_0.9-7 RColorBrewer_1.1-2 [5] progress_1.1.2 httr_1.3.1 [7] doRNG_1.6.6 tools_3.5.0 [9] R6_2.2.2 HDF5Array_1.8.0 [11] DBI_1.0.0 lazyeval_0.2.1 [13] colorspace_1.3-2 permute_0.9-4 [15] prettyunits_1.0.2 bit_1.1-12 [17] compiler_3.5.0 pkgmaker_0.22 [19] rtracklayer_1.40.0 scales_0.5.0 [21] readr_1.1.1 stringr_1.3.0 [23] digest_0.6.15 Rsamtools_1.32.0 [25] R.utils_2.6.0 XVector_0.20.0 [27] pkgconfig_2.0.1 htmltools_0.3.6 [29] limma_3.36.0 BSgenome_1.48.0 [31] regioneR_1.12.0 rlang_0.2.0 [33] RSQLite_2.1.0 BiocInstaller_1.30.0 [35] shiny_1.0.5 DelayedMatrixStats_1.2.0 [37] bindr_0.1.1 gtools_3.5.0 [39] dplyr_0.7.4 R.oo_1.22.0 [41] RCurl_1.95-4.10 magrittr_1.5 [43] GenomeInfoDbData_1.1.0 Matrix_1.2-14 [45] Rcpp_0.12.16 munsell_0.4.3 [47] Rhdf5lib_1.2.0 R.methodsS3_1.7.1 [49] stringi_1.2.2 yaml_2.1.19 [51] zlibbioc_1.26.0 rhdf5_2.24.0 [53] plyr_1.8.4 bumphunter_1.22.0 [55] AnnotationHub_2.12.0 grid_3.5.0 [57] blob_1.1.1 promises_1.0.1 [59] lattice_0.20-35 splines_3.5.0 [61] Biostrings_2.48.0 GenomicFeatures_1.32.0 [63] hms_0.4.2 locfit_1.5-9.1 [65] pillar_1.2.2 rngtools_1.2.4 [67] codetools_0.2-15 reshape2_1.4.3 [69] biomaRt_2.36.0 XML_3.98-1.11 [71] glue_1.2.0 outliers_0.14 [73] annotatr_1.6.0 data.table_1.11.0 [75] foreach_1.4.4 httpuv_1.4.2 [77] gtable_0.2.0 assertthat_0.2.0 [79] ggplot2_2.2.1 mime_0.5 [81] xtable_1.8-2 later_0.7.2 [83] tibble_1.4.2 iterators_1.0.9 [85] registry_0.5 GenomicAlignments_1.16.0 [87] AnnotationDbi_1.42.0 memoise_1.1.0 [89] bindrcpp_0.2.2 interactiveDisplayBase_1.18.0
Thank you for your answer.
Sorry for the late reply - I see your confusion, Bruno. The dmrseq vignette correctly stated that the reference category was determined alphabetically, but there was a typo in the explanation of "hypo" and "hyper" methylation in the example. Yes, as you observed and as James demonstrates above, the condition label with higher alphabetical rank will become the reference category. In this example, "h1" is the reference, so a negative effect size means that "imr90" is hypomethylated relative to "h1".
I have fixed the typo in the vignette; thanks for catching the error!