Hello everyone,
I'm trying to use the tool dmrseq to find differentially methylated regions (DMRs) from bisulfite-seq data. My experimental design looks like this:
## celltype treatment
## <factor> <factor>
## neuron_c1 neuron control
## neuron_c2 neuron control
## neuron_t1 neuron treated
## neuron_t2 neuron treated
## stemcell_c1 stemcell control
## stemcell_c2 stemcell control
## stemcell_t1 stemcell treated
## stemcell_t2 stemcell treated
Now I want to find e.g. DMRs between stem cells and neurons while adjusting for the effect of treatment. According to the help site of "dmrseq::dmrseq", this should be possible by specifying the "adjustCovariate" argument.
However, specifying "adjustCovariate" always results in an error (that occurs very quickly before any calculations are done). See below for a minimal working (crashing) example that's reproducible on multiple machines.
Of course I also tried the code below using a proper dataset, but to no avail. I also tried specifying pData as character instead of factor, but it doesn't seem to make a difference.
Thank you in advance, my experience with dmrseq has been great so far!
Detecting differentially methylated regions with dmrseq
require(bsseq)
require(BiocGenerics)
library(dmrseq)
Constructing a toy dataset from the bsseq example file.
infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz",
package = 'bsseq')
cpg <- BiocGenerics::combine(
bsseq::read.bismark(files = infile, sampleNames = "neuron_c1", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "neuron_c2", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "neuron_t1", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "neuron_t2", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "stemcell_c1", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "stemcell_c2", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "stemcell_t1", strandCollapse = F),
bsseq::read.bismark(files = infile, sampleNames = "stemcell_t2", strandCollapse = F)
)
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.8 secs
## [read.bismark] Joining samples ... done in 0.2 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.5 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
## [read.bismark] Reading file '/home/lukas/R/x86_64-pc-linux-gnu-library/3.4/bsseq/extdata/test_data.fastq_bismark.bismark.cov.gz' ... done in 0.1 secs
## [read.bismark] Joining samples ... done in 0.1 secs
cpg
## An object of type 'BSseq' with
## 2013 methylation loci
## 8 samples
## has not been smoothed
## All assays are in-memory
Labeling the toy samples according to treatment & cell type:
pData(cpg)$celltype <- as.factor(c("neuron", "neuron", "neuron", "neuron",
"stemcell", "stemcell", "stemcell", "stemcell"))
pData(cpg)$treatment <- as.factor(c("control", "control", "treated", "treated",
"control", "control", "treated", "treated"))
Double-checking that this table makes sense:
pData(cpg)
## DataFrame with 8 rows and 2 columns
## celltype treatment
## <factor> <factor>
## neuron_c1 neuron control
## neuron_c2 neuron control
## neuron_t1 neuron treated
## neuron_t2 neuron treated
## stemcell_c1 stemcell control
## stemcell_c2 stemcell control
## stemcell_t1 stemcell treated
## stemcell_t2 stemcell treated
Only keep loci that have at least 1 coverage in all samples
cpg <- filterLoci(cpg)
## Filtering out loci with coverage less than 1 read in at least one sample
## Removed 0 out of 2013 loci
Trying to detect differentially methylated regions (DMRs) with dmrseq while adjusting for "treatment":
regions <- dmrseq(bs = cpg,
testCovariate = "celltype",
adjustCovariate = "treatment")
## Error in colnames(design)[, (max(coeff) + 1):ncol(design)] <- colnames(pData(bs))[adjustCovariate]: incorrect number of subscripts on matrix
Traceback is not very helpful.
traceback()
## 1: dmrseq(bs = cpg, testCovariate = "celltype", adjustCovariate = "treatment")
That didn't work, let's try again, this time I'm specifying the covariate by column number:
regions <- dmrseq(bs = cpg,
testCovariate = "celltype",
adjustCovariate = 2)
## Error in colnames(design)[, (max(coeff) + 1):ncol(design)] <- colnames(pData(bs))[adjustCovariate]: incorrect number of subscripts on matrix
traceback()
## 1: dmrseq(bs = cpg, testCovariate = "celltype", adjustCovariate = 2)
Still didn't work. But it works if I don't specify a covariate to adjust for:
regions <- dmrseq(bs = cpg,
testCovariate = "celltype")
## Condition stemcell vs neuron
## Using a single core
## Detecting candidate regions with coefficient larger than 0.1 in magnitude.
## ... (rest of the results omitted) ...
(Okay, it doesn't actually work here because I'm using a toy dataset, but it works on my own data)
BiocInstaller::biocValid()
## [1] TRUE
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] dmrseq_0.99.1 bsseq_1.14.0
## [3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1
## [5] matrixStats_0.53.0 Biobase_2.38.0
## [7] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0
## [9] IRanges_2.12.0 S4Vectors_0.16.0
## [11] BiocGenerics_0.24.0 knitr_1.19
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-131
## [2] bitops_1.0-6
## [3] TxDb.Rnorvegicus.UCSC.rn6.refGene_3.4.1
## [4] bit64_0.9-7
## [5] RColorBrewer_1.1-2
## [6] progress_1.1.2
## [7] httr_1.3.1
## [8] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
## [9] doRNG_1.6.6
## [10] tools_3.4.3
## [11] R6_2.2.2
## [12] DBI_0.7
## [13] lazyeval_0.2.1
## [14] colorspace_1.3-2
## [15] permute_0.9-4
## [16] prettyunits_1.0.2
## [17] RMySQL_0.10.13
## [18] bit_1.1-12
## [19] compiler_3.4.3
## [20] pkgmaker_0.22
## [21] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
## [22] rtracklayer_1.38.3
## [23] scales_0.5.0
## [24] readr_1.1.1
## [25] stringr_1.2.0
## [26] digest_0.6.15
## [27] Rsamtools_1.30.0
## [28] R.utils_2.6.0
## [29] XVector_0.18.0
## [30] pkgconfig_2.0.1
## [31] htmltools_0.3.6
## [32] BSgenome_1.46.0
## [33] regioneR_1.10.0
## [34] limma_3.34.6
## [35] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.4.1
## [36] rlang_0.1.6
## [37] RSQLite_2.0
## [38] BiocInstaller_1.28.0
## [39] shiny_1.0.5
## [40] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [41] bindr_0.1
## [42] BiocParallel_1.12.0
## [43] gtools_3.5.0
## [44] dplyr_0.7.4
## [45] R.oo_1.21.0
## [46] RCurl_1.95-4.10
## [47] magrittr_1.5
## [48] GenomeInfoDbData_1.0.0
## [49] Matrix_1.2-11
## [50] Rcpp_0.12.15
## [51] munsell_0.4.3
## [52] R.methodsS3_1.7.1
## [53] stringi_1.1.6
## [54] yaml_2.1.16
## [55] zlibbioc_1.24.0
## [56] bumphunter_1.20.0
## [57] org.Hs.eg.db_3.5.0
## [58] plyr_1.8.4
## [59] AnnotationHub_2.10.1
## [60] grid_3.4.3
## [61] blob_1.1.0
## [62] lattice_0.20-35
## [63] Biostrings_2.46.0
## [64] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.2
## [65] GenomicFeatures_1.30.2
## [66] hms_0.4.1
## [67] locfit_1.5-9.1
## [68] pillar_1.1.0
## [69] org.Dm.eg.db_3.5.0
## [70] rngtools_1.2.4
## [71] codetools_0.2-15
## [72] reshape2_1.4.3
## [73] biomaRt_2.34.2
## [74] XML_3.98-1.9
## [75] glue_1.2.0
## [76] evaluate_0.10.1
## [77] outliers_0.14
## [78] annotatr_1.4.1
## [79] data.table_1.10.4-3
## [80] foreach_1.4.4
## [81] httpuv_1.3.5
## [82] org.Mm.eg.db_3.5.0
## [83] gtable_0.2.0
## [84] assertthat_0.2.0
## [85] org.Rn.eg.db_3.5.0
## [86] ggplot2_2.2.1
## [87] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
## [88] mime_0.5
## [89] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2
## [90] xtable_1.8-2
## [91] tibble_1.4.2
## [92] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
## [93] iterators_1.0.9
## [94] registry_0.5
## [95] GenomicAlignments_1.14.1
## [96] AnnotationDbi_1.40.0
## [97] memoise_1.1.0
## [98] bindrcpp_0.2
## [99] interactiveDisplayBase_1.16.0
## [100] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
Hi Keegan,
I updated to the latest version and now it works flawlessly. Thank you very much for the quick fix!
Lukas