Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Dear All,
I am trying to identify Differentially Methylated Regions (DMRs) using
a tab-delimited file of beta values obtained with Illumina's 450k
methylation arrays. I am using minfi and its implementation of
bumphunter (development version 1.9.11).
However, I run out of memory when I use a mere 100 permutations, and
have tried different amounts of memory to no avail. Of further concern
is the fact that the bumphunter reference (Jaffe, et al (2011)
International J. of Epidemiology: 41:200-209) states that each
permutation takes two hours (see p. 205). At that rate, a standard run
with 1000 permutations would take over 83 days.
My questions are as follows (code and output included below):
1. Is there a better way to run minfi/bumphunter, or specific
resources I should allocate? (I am currently using parallel processing
with four cores, and have tried various amounts of memory.)
2. Does minfi/bumphunter include a non-permutation-based method for
estimating statistical significance? Jaffe et al state that they
implemented a second, faster, p-value calculation following the method
of Effron and Tibshirani (see p. 205). Does anyone know if this method
is implemented in minfi?
3. Is there a better method for identifying DMRs? (I am also
experimenting with methyAnalysis, which has nice visualization
options, but (a) is poorly documented and (b) surprisingly found no
DMRs in my data set using default settings.)
Many thanks in advance for any insight and/or suggestions!
Sincerely,
Allegra
_________________________________________________________________
Code:
# This R script reads in a tab-delimited matrix of Beta values (with a
header line)
# and finds differentially-methylated regions using minfi
# sample command: bsub -oo beta2dmr.140224.out -e beta2dmr.140224.err
-q long -M 16000000 -R 'select[type==LINUX64 && mem>16000]
span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr
"~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r"
library(minfi); # currently requires development version 1.9.11
library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: had to
be installed manually
library(foreach); # used for parallel processing
library(doRNG); # is this actually used??
library(doParallel); # "backend" for foreach package
registerDoParallel(cores = 4);
#betas.frame <- read.table(file="testbetas.txt",sep="\t",header=TRUE,n
a.strings=c(NA),row.names=1,quote="");
betas.frame <- read.table(file="betas_140220.txt",sep="\t",header=TRUE
,na.strings=c(NA),row.names=1,quote="");
betas <- as.matrix(betas.frame); # convert data frame to matrix
data.rs <- RatioSet(Beta = betas,annotation=c(array=
"IlluminaHumanMethylation450k", annotation = "ilmn12.hg19")); #
create RatioSet
data.grs <- mapToGenomedata.rs); # create GenomicRatioSet
sampleNamesdata.rs); # display the sample names (ie column headings)
targets <- read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefra
ctory/dmr.analysis',recursive=FALSE); # read Sample Sheet
samples <- sampleNamesdata.rs); # get sample names (ie column
headers) from input file of beta values
j <- match(samples,targets$Sample); # get row indices from sample
sheet that correspond to sample names (column headers) in beta value
input file
pheno <- targets$Sample_Group[j]; # extract corresponding phenotype
(e.g. "tumor" or "normal") from Sample_Group column of Sample Sheet
designMatrix <- model.matrix(~ pheno); # specify design matrix for
regression
dmr <- bumphunter(data.grs, design = designMatrix, cutoff = 0.05,
B=100); # run bumphunter with B permutations
save(dmr, file="dmr.minfi.100_140222");
__________________________________________________________________
Standard Output:
[1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05"
[3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05"
[5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05"
[7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05"
[9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05"
[11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05"
[13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05"
[15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05"
[17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05"
[19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05"
[21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05"
[23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05"
[25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05"
[27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05"
[29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05"
[31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05"
[33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05"
[35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05"
[37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05"
[39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05"
[41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05"
[43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05"
[45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05"
[47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05"
[49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05"
[51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05"
[53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05"
[55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05"
[read.450k.sheet] Found the following CSV files:
[1] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Idat
SampleSheet.csv"
[2] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Samp
leSheet140219.csv"
------------------------------------------------------------
Sender: LSF System <lsfadmin at="" blade14-4-11.gsc.wustl.edu="">
Subject: Job 5329542: <beta2dmr> Exited
Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by
user <apetti> in cluster <lsfcluster1>.
Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu>, in queue
<long>, as user <apetti> in cluster <lsfcluster1>.
</gscuser> was used as the home directory.
__________________________________________________________________
Standard Error:
Loading required package: methods
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ???BiocGenerics???
The following objects are masked from ???package:parallel???:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from ???package:stats???:
xtabs
The following objects are masked from ???package:base???:
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, as.vector, cbind, colnames, duplicated, eval,
evalq,
get, intersect, is.unsorted, lapply, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int,
rownames, sapply, setdiff, sort, table, tapply, union, unique,
unlist
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: lattice
Loading required package: GenomicRanges
Loading required package: IRanges
Loading required package: XVector
Loading required package: Biostrings
Loading required package: bumphunter
Loading required package: foreach
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.1 2013-03-22
Attaching package: ???locfit???
The following objects are masked from ???package:GenomicRanges???:
left, right
Loading required package: rngtools
Loading required package: pkgmaker
Loading required package: registry
Attaching package: ???pkgmaker???
The following object is masked from ???package:IRanges???:
new2
Warning messages:
1: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy
sis/IdatSampleSheet.csv", :
Could not infer array name for file: /gscmnt/gc3016/info/medseq/apet
ti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv
2: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy
sis/IdatSampleSheet.csv", :
Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apet
ti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv
3: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy
sis/IdatSampleSheet.csv", :
Could not infer array name for file: /gscmnt/gc3016/info/medseq/apet
ti/AMLrefractory/dmr.analysis/SampleSheet140219.csv
4: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy
sis/IdatSampleSheet.csv", :
Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apet
ti/AMLrefractory/dmr.analysis/SampleSheet140219.csv
[bumphunterEngine] Parallelizing using 4 workers/cores (backend:
doParallel, version: 1.0.6).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Performing 100 permutations.
[bumphunterEngine] Computing marginal permutation p-values.
[bumphunterEngine] cutoff: 0.05
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 233469 bumps.
[bumphunterEngine] Computing regions for each permutation.
Warning message:
In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, :
NAs found and removed. ind changed.
Execution halted
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=C
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] doParallel_1.0.6
[2] doRNG_1.5.5
[3] rngtools_1.2.3
[4] pkgmaker_0.17.4
[5] registry_0.2
[6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1
[7] minfi_1.9.11
[8] bumphunter_1.2.0
[9] locfit_1.5-9.1
[10] iterators_1.0.6
[11] foreach_1.4.1
[12] Biostrings_2.30.1
[13] GenomicRanges_1.14.4
[14] XVector_0.2.0
[15] IRanges_1.20.6
[16] lattice_0.20-24
[17] Biobase_2.22.0
[18] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29
[4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4
[7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1
[10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0
[13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1
[16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2
[19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4
[22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0
[25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2
[28] survival_2.37-7 tools_3.0.2 xtable_1.7-1
--
Sent via the guest posting facility at bioconductor.org.