DMR identification: minfi speed, memory problems

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.7 years ago

Dear All, I am trying to identify Differentially Methylated Regions (DMRs) using a tab-delimited file of beta values obtained with Illumina's 450k methylation arrays. I am using minfi and its implementation of bumphunter (development version 1.9.11). However, I run out of memory when I use a mere 100 permutations, and have tried different amounts of memory to no avail. Of further concern is the fact that the bumphunter reference (Jaffe, et al (2011) International J. of Epidemiology: 41:200-209) states that each permutation takes two hours (see p. 205). At that rate, a standard run with 1000 permutations would take over 83 days. My questions are as follows (code and output included below): 1. Is there a better way to run minfi/bumphunter, or specific resources I should allocate? (I am currently using parallel processing with four cores, and have tried various amounts of memory.) 2. Does minfi/bumphunter include a non-permutation-based method for estimating statistical significance? Jaffe et al state that they implemented a second, faster, p-value calculation following the method of Effron and Tibshirani (see p. 205). Does anyone know if this method is implemented in minfi? 3. Is there a better method for identifying DMRs? (I am also experimenting with methyAnalysis, which has nice visualization options, but (a) is poorly documented and (b) surprisingly found no DMRs in my data set using default settings.) Many thanks in advance for any insight and/or suggestions! Sincerely, Allegra _________________________________________________________________ Code: # This R script reads in a tab-delimited matrix of Beta values (with a header line) # and finds differentially-methylated regions using minfi # sample command: bsub -oo beta2dmr.140224.out -e beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" library(minfi); # currently requires development version 1.9.11 library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: had to be installed manually library(foreach); # used for parallel processing library(doRNG); # is this actually used?? library(doParallel); # "backend" for foreach package registerDoParallel(cores = 4); #betas.frame <- read.table(file="testbetas.txt",sep="\t",header=TRUE,n a.strings=c(NA),row.names=1,quote=""); betas.frame <- read.table(file="betas_140220.txt",sep="\t",header=TRUE ,na.strings=c(NA),row.names=1,quote=""); betas <- as.matrix(betas.frame); # convert data frame to matrix data.rs <- RatioSet(Beta = betas,annotation=c(array= "IlluminaHumanMethylation450k", annotation = "ilmn12.hg19")); # create RatioSet data.grs <- mapToGenomedata.rs); # create GenomicRatioSet sampleNamesdata.rs); # display the sample names (ie column headings) targets <- read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefra ctory/dmr.analysis',recursive=FALSE); # read Sample Sheet samples <- sampleNamesdata.rs); # get sample names (ie column headers) from input file of beta values j <- match(samples,targets$Sample); # get row indices from sample sheet that correspond to sample names (column headers) in beta value input file pheno <- targets$Sample_Group[j]; # extract corresponding phenotype (e.g. "tumor" or "normal") from Sample_Group column of Sample Sheet designMatrix <- model.matrix(~ pheno); # specify design matrix for regression dmr <- bumphunter(data.grs, design = designMatrix, cutoff = 0.05, B=100); # run bumphunter with B permutations save(dmr, file="dmr.minfi.100_140222"); __________________________________________________________________ Standard Output: [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" [read.450k.sheet] Found the following CSV files: [1] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Idat SampleSheet.csv" [2] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Samp leSheet140219.csv" ------------------------------------------------------------ Sender: LSF System <lsfadmin at="" blade14-4-11.gsc.wustl.edu=""> Subject: Job 5329542: <beta2dmr> Exited Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by user <apetti> in cluster <lsfcluster1>. Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu>, in queue <long>, as user <apetti> in cluster <lsfcluster1>. </gscuser> was used as the home directory. __________________________________________________________________ Standard Error: Loading required package: methods Loading required package: BiocGenerics Loading required package: parallel Attaching package: ???BiocGenerics??? The following objects are masked from ???package:parallel???: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ???package:stats???: xtabs The following objects are masked from ???package:base???: Filter, Find, Map, Position, Reduce, anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq, get, intersect, is.unsorted, lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: lattice Loading required package: GenomicRanges Loading required package: IRanges Loading required package: XVector Loading required package: Biostrings Loading required package: bumphunter Loading required package: foreach Loading required package: iterators Loading required package: locfit locfit 1.5-9.1 2013-03-22 Attaching package: ???locfit??? The following objects are masked from ???package:GenomicRanges???: left, right Loading required package: rngtools Loading required package: pkgmaker Loading required package: registry Attaching package: ???pkgmaker??? The following object is masked from ???package:IRanges???: new2 Warning messages: 1: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", : Could not infer array name for file: /gscmnt/gc3016/info/medseq/apet ti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv 2: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", : Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apet ti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv 3: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", : Could not infer array name for file: /gscmnt/gc3016/info/medseq/apet ti/AMLrefractory/dmr.analysis/SampleSheet140219.csv 4: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", : Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apet ti/AMLrefractory/dmr.analysis/SampleSheet140219.csv [bumphunterEngine] Parallelizing using 4 workers/cores (backend: doParallel, version: 1.0.6). [bumphunterEngine] Computing coefficients. [bumphunterEngine] Performing 100 permutations. [bumphunterEngine] Computing marginal permutation p-values. [bumphunterEngine] cutoff: 0.05 [bumphunterEngine] Finding regions. [bumphunterEngine] Found 233469 bumps. [bumphunterEngine] Computing regions for each permutation. Warning message: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. Execution halted -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8 LC_COLLATE=C [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] doParallel_1.0.6 [2] doRNG_1.5.5 [3] rngtools_1.2.3 [4] pkgmaker_0.17.4 [5] registry_0.2 [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 [7] minfi_1.9.11 [8] bumphunter_1.2.0 [9] locfit_1.5-9.1 [10] iterators_1.0.6 [11] foreach_1.4.1 [12] Biostrings_2.30.1 [13] GenomicRanges_1.14.4 [14] XVector_0.2.0 [15] IRanges_1.20.6 [16] lattice_0.20-24 [17] Biobase_2.22.0 [18] BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 -- Sent via the guest posting facility at bioconductor.org.

Annotation convert minfi methyAnalysis bumphunter Annotation convert minfi methyAnalysis • 3.5k views

ADD COMMENT • link updated 11.2 years ago by James W. MacDonald 68k • written 11.2 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

Hi Allegra, On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: > Dear All, > > I am trying to identify Differentially Methylated Regions (DMRs) using a tab-delimited file of beta values obtained with Illumina's 450k methylation arrays. I am using minfi and its implementation of bumphunter (development version 1.9.11). > > However, I run out of memory when I use a mere 100 permutations, and have tried different amounts of memory to no avail. Of further concern is the fact that the bumphunter reference (Jaffe, et al (2011) International J. of Epidemiology: 41:200-209) states that each permutation takes two hours (see p. 205). At that rate, a standard run with 1000 permutations would take over 83 days. > > My questions are as follows (code and output included below): > 1. Is there a better way to run minfi/bumphunter, or specific resources I should allocate? (I am currently using parallel processing with four cores, and have tried various amounts of memory.) > 2. Does minfi/bumphunter include a non-permutation-based method for estimating statistical significance? Jaffe et al state that they implemented a second, faster, p-value calculation following the method of Effron and Tibshirani (see p. 205). Does anyone know if this method is implemented in minfi? > 3. Is there a better method for identifying DMRs? (I am also experimenting with methyAnalysis, which has nice visualization options, but (a) is poorly documented and (b) surprisingly found no DMRs in my data set using default settings.) > > Many thanks in advance for any insight and/or suggestions! > > Sincerely, > Allegra > _________________________________________________________________ > Code: > > # This R script reads in a tab-delimited matrix of Beta values (with a header line) > # and finds differentially-methylated regions using minfi > > # sample command: bsub -oo beta2dmr.140224.out -e beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" > > library(minfi); # currently requires development version 1.9.11 > library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: had to be installed manually > library(foreach); # used for parallel processing > library(doRNG); # is this actually used?? > library(doParallel); # "backend" for foreach package > registerDoParallel(cores = 4); > > #betas.frame <- read.table(file="testbetas.txt",sep="\t",header=TRUE ,na.strings=c(NA),row.names=1,quote=""); > betas.frame <- read.table(file="betas_140220.txt",sep="\t",header=TR UE,na.strings=c(NA),row.names=1,quote=""); > betas <- as.matrix(betas.frame); # convert data frame to matrix > data.rs <- RatioSet(Beta = betas,annotation=c(array= "IlluminaHumanMethylation450k", annotation = "ilmn12.hg19")); # create RatioSet > data.grs <- mapToGenomedata.rs); # create GenomicRatioSet > sampleNamesdata.rs); # display the sample names (ie column headings) > targets <- read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLref ractory/dmr.analysis',recursive=FALSE); # read Sample Sheet > samples <- sampleNamesdata.rs); # get sample names (ie column headers) from input file of beta values > j <- match(samples,targets$Sample); # get row indices from sample sheet that correspond to sample names (column headers) in beta value input file > pheno <- targets$Sample_Group[j]; # extract corresponding phenotype (e.g. "tumor" or "normal") from Sample_Group column of Sample Sheet > designMatrix <- model.matrix(~ pheno); # specify design matrix for regression > dmr <- bumphunter(data.grs, design = designMatrix, cutoff = 0.05, B=100); # run bumphunter with B permutations That's your problem ^^^^^^^^^^^^^^^^^^^^ You can interpret the cutoff argument this way; 'Select the bumps that are larger than the Xth quantile of all bumps in my data set'. By using 0.05, you are selecting almost all of the bumps in your data, which is not what you want to do. Instead, you want to use a cutoff of 0.95 or 0.99, which is 'Select the bumps that are larger than 95% or 99% of all the bumps in my data'. Or to put it another way, you want to select the bumps that are big enough that they are likely to represent differential methylation, and ignore all the little bumps that likely arise by chance alone. Best, Jim > > save(dmr, file="dmr.minfi.100_140222"); > > __________________________________________________________________ > Standard Output: > > [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" > [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" > [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" > [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" > [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" > [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" > [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" > [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" > [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" > [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" > [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" > [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" > [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" > [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" > [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" > [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" > [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" > [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" > [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" > [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" > [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" > [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" > [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" > [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" > [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" > [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" > [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" > [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" > [read.450k.sheet] Found the following CSV files: > [1] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv" > [2] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sa mpleSheet140219.csv" > > ------------------------------------------------------------ > Sender: LSF System <lsfadmin at="" blade14-4-11.gsc.wustl.edu=""> > Subject: Job 5329542: <beta2dmr> Exited > > Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by user <apetti> in cluster <lsfcluster1>. > Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu>, in queue <long>, as user <apetti> in cluster <lsfcluster1>. > </gscuser> was used as the home directory. > > __________________________________________________________________ > Standard Error: > > Loading required package: methods > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: ???BiocGenerics??? > > The following objects are masked from ???package:parallel???: > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > clusterExport, clusterMap, parApply, parCapply, parLapply, > parLapplyLB, parRapply, parSapply, parSapplyLB > > The following object is masked from ???package:stats???: > > xtabs > > The following objects are masked from ???package:base???: > > Filter, Find, Map, Position, Reduce, anyDuplicated, append, > as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq, > get, intersect, is.unsorted, lapply, mapply, match, mget, order, > paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int, > rownames, sapply, setdiff, sort, table, tapply, union, unique, > unlist > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > > Loading required package: lattice > Loading required package: GenomicRanges > Loading required package: IRanges > Loading required package: XVector > Loading required package: Biostrings > Loading required package: bumphunter > Loading required package: foreach > Loading required package: iterators > Loading required package: locfit > locfit 1.5-9.1 2013-03-22 > > Attaching package: ???locfit??? > > The following objects are masked from ???package:GenomicRanges???: > > left, right > > Loading required package: rngtools > Loading required package: pkgmaker > Loading required package: registry > > Attaching package: ???pkgmaker??? > > The following object is masked from ???package:IRanges???: > > new2 > > Warning messages: > 1: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.ana lysis/IdatSampleSheet.csv", : > Could not infer array name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv > 2: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.ana lysis/IdatSampleSheet.csv", : > Could not infer slide name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv > 3: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.ana lysis/IdatSampleSheet.csv", : > Could not infer array name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/SampleSheet140219.csv > 4: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.ana lysis/IdatSampleSheet.csv", : > Could not infer slide name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/SampleSheet140219.csv > [bumphunterEngine] Parallelizing using 4 workers/cores (backend: doParallel, version: 1.0.6). > [bumphunterEngine] Computing coefficients. > [bumphunterEngine] Performing 100 permutations. > [bumphunterEngine] Computing marginal permutation p-values. > [bumphunterEngine] cutoff: 0.05 > [bumphunterEngine] Finding regions. > [bumphunterEngine] Found 233469 bumps. > [bumphunterEngine] Computing regions for each permutation. > > Warning message: > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > Execution halted > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=C > [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] doParallel_1.0.6 > [2] doRNG_1.5.5 > [3] rngtools_1.2.3 > [4] pkgmaker_0.17.4 > [5] registry_0.2 > [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 > [7] minfi_1.9.11 > [8] bumphunter_1.2.0 > [9] locfit_1.5-9.1 > [10] iterators_1.0.6 > [11] foreach_1.4.1 > [12] Biostrings_2.30.1 > [13] GenomicRanges_1.14.4 > [14] XVector_0.2.0 > [15] IRanges_1.20.6 > [16] lattice_0.20-24 > [17] Biobase_2.22.0 > [18] BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 > [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 > [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 > [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 > [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 > [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 > [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 > [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 > [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.2 years ago James W. MacDonald 68k

0

Entering edit mode

Ah - thank you very much for the clarification!! I used 0.05 rather mindlessly because that's what the tutorial used. Best, Allegra _________________________________ Allegra A. Petti, Ph.D. The Genome Institute 4444 Forest Park Ave. St. Louis, MO 63108 apetti@genome.wustl.edu allegra.conbrio@post.harvard.edu On Feb 24, 2014, at 1:37 PM, "James W. MacDonald" <jmacdon@uw.edu> wrote: > Hi Allegra, > > > On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: >> Dear All, >> >> I am trying to identify Differentially Methylated Regions (DMRs) using a tab-delimited file of beta values obtained with Illumina's 450k methylation arrays. I am using minfi and its implementation of bumphunter (development version 1.9.11). >> >> However, I run out of memory when I use a mere 100 permutations, and have tried different amounts of memory to no avail. Of further concern is the fact that the bumphunter reference (Jaffe, et al (2011) International J. of Epidemiology: 41:200-209) states that each permutation takes two hours (see p. 205). At that rate, a standard run with 1000 permutations would take over 83 days. >> >> My questions are as follows (code and output included below): >> 1. Is there a better way to run minfi/bumphunter, or specific resources I should allocate? (I am currently using parallel processing with four cores, and have tried various amounts of memory.) >> 2. Does minfi/bumphunter include a non-permutation-based method for estimating statistical significance? Jaffe et al state that they implemented a second, faster, p-value calculation following the method of Effron and Tibshirani (see p. 205). Does anyone know if this method is implemented in minfi? >> 3. Is there a better method for identifying DMRs? (I am also experimenting with methyAnalysis, which has nice visualization options, but (a) is poorly documented and (b) surprisingly found no DMRs in my data set using default settings.) >> >> Many thanks in advance for any insight and/or suggestions! >> >> Sincerely, >> Allegra >> _________________________________________________________________ >> Code: >> >> # This R script reads in a tab-delimited matrix of Beta values (with a header line) >> # and finds differentially-methylated regions using minfi >> >> # sample command: bsub -oo beta2dmr.140224.out -e beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" >> >> library(minfi); # currently requires development version 1.9.11 >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: had to be installed manually >> library(foreach); # used for parallel processing >> library(doRNG); # is this actually used?? >> library(doParallel); # "backend" for foreach package >> registerDoParallel(cores = 4); >> >> #betas.frame <- read.table(file="testbetas.txt",sep="\t",header=TRU E,na.strings=c(NA),row.names=1,quote=""); >> betas.frame <- read.table(file="betas_140220.txt",sep="\t",header=T RUE,na.strings=c(NA),row.names=1,quote=""); >> betas <- as.matrix(betas.frame); # convert data frame to matrix >> data.rs <- RatioSet(Beta = betas,annotation=c(array= "IlluminaHumanMethylation450k", annotation = "ilmn12.hg19")); # create RatioSet >> data.grs <- mapToGenomedata.rs); # create GenomicRatioSet >> sampleNamesdata.rs); # display the sample names (ie column headings) >> targets <- read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLre fractory/dmr.analysis',recursive=FALSE); # read Sample Sheet >> samples <- sampleNamesdata.rs); # get sample names (ie column headers) from input file of beta values >> j <- match(samples,targets$Sample); # get row indices from sample sheet that correspond to sample names (column headers) in beta value input file >> pheno <- targets$Sample_Group[j]; # extract corresponding phenotype (e.g. "tumor" or "normal") from Sample_Group column of Sample Sheet >> designMatrix <- model.matrix(~ pheno); # specify design matrix for regression >> dmr <- bumphunter(data.grs, design = designMatrix, cutoff = 0.05, B=100); # run bumphunter with B permutations > > That's your problem ^^^^^^^^^^^^^^^^^^^^ > > You can interpret the cutoff argument this way; 'Select the bumps that are larger than the Xth quantile of all bumps in my data set'. By using 0.05, you are selecting almost all of the bumps in your data, which is not what you want to do. > > Instead, you want to use a cutoff of 0.95 or 0.99, which is 'Select the bumps that are larger than 95% or 99% of all the bumps in my data'. Or to put it another way, you want to select the bumps that are big enough that they are likely to represent differential methylation, and ignore all the little bumps that likely arise by chance alone. > > Best, > > Jim > > >> save(dmr, file="dmr.minfi.100_140222"); >> >> __________________________________________________________________ >> Standard Output: >> >> [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" >> [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" >> [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" >> [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" >> [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" >> [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" >> [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" >> [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" >> [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" >> [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" >> [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" >> [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" >> [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" >> [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" >> [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" >> [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" >> [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" >> [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" >> [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" >> [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" >> [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" >> [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" >> [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" >> [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" >> [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" >> [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" >> [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" >> [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" >> [read.450k.sheet] Found the following CSV files: >> [1] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/I datSampleSheet.csv" >> [2] "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/S ampleSheet140219.csv" >> >> ------------------------------------------------------------ >> Sender: LSF System <lsfadmin@blade14-4-11.gsc.wustl.edu> >> Subject: Job 5329542: <beta2dmr> Exited >> >> Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by user <apetti> in cluster <lsfcluster1>. >> Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu>, in queue <long>, as user <apetti> in cluster <lsfcluster1>. >> </gscuser> was used as the home directory. >> >> __________________________________________________________________ >> Standard Error: >> >> Loading required package: methods >> Loading required package: BiocGenerics >> Loading required package: parallel >> >> Attaching package: âBiocGenericsâ >> >> The following objects are masked from âpackage:parallelâ: >> >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, >> clusterExport, clusterMap, parApply, parCapply, parLapply, >> parLapplyLB, parRapply, parSapply, parSapplyLB >> >> The following object is masked from âpackage:statsâ: >> >> xtabs >> >> The following objects are masked from âpackage:baseâ: >> >> Filter, Find, Map, Position, Reduce, anyDuplicated, append, >> as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq, >> get, intersect, is.unsorted, lapply, mapply, match, mget, order, >> paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int, >> rownames, sapply, setdiff, sort, table, tapply, union, unique, >> unlist >> >> Loading required package: Biobase >> Welcome to Bioconductor >> >> Vignettes contain introductory material; view with >> 'browseVignettes()'. To cite Bioconductor, see >> 'citation("Biobase")', and for packages 'citation("pkgname")'. >> >> Loading required package: lattice >> Loading required package: GenomicRanges >> Loading required package: IRanges >> Loading required package: XVector >> Loading required package: Biostrings >> Loading required package: bumphunter >> Loading required package: foreach >> Loading required package: iterators >> Loading required package: locfit >> locfit 1.5-9.1 2013-03-22 >> >> Attaching package: âlocfitâ >> >> The following objects are masked from âpackage:GenomicRangesâ: >> >> left, right >> >> Loading required package: rngtools >> Loading required package: pkgmaker >> Loading required package: registry >> >> Attaching package: âpkgmakerâ >> >> The following object is masked from âpackage:IRangesâ: >> >> new2 >> >> Warning messages: >> 1: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.an alysis/IdatSampleSheet.csv", : >> Could not infer array name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv >> 2: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.an alysis/IdatSampleSheet.csv", : >> Could not infer slide name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/IdatSampleSheet.csv >> 3: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.an alysis/IdatSampleSheet.csv", : >> Could not infer array name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/SampleSheet140219.csv >> 4: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.an alysis/IdatSampleSheet.csv", : >> Could not infer slide name for file: /gscmnt/gc3016/info/medseq/a petti/AMLrefractory/dmr.analysis/SampleSheet140219.csv >> [bumphunterEngine] Parallelizing using 4 workers/cores (backend: doParallel, version: 1.0.6). >> [bumphunterEngine] Computing coefficients. >> [bumphunterEngine] Performing 100 permutations. >> [bumphunterEngine] Computing marginal permutation p-values. >> [bumphunterEngine] cutoff: 0.05 >> [bumphunterEngine] Finding regions. >> [bumphunterEngine] Found 233469 bumps. >> [bumphunterEngine] Computing regions for each permutation. >> >> Warning message: >> In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : >> NAs found and removed. ind changed. >> Execution halted >> >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_US.utf8 LC_COLLATE=C >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 >> [7] LC_PAPER=en_US.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] doParallel_1.0.6 >> [2] doRNG_1.5.5 >> [3] rngtools_1.2.3 >> [4] pkgmaker_0.17.4 >> [5] registry_0.2 >> [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 >> [7] minfi_1.9.11 >> [8] bumphunter_1.2.0 >> [9] locfit_1.5-9.1 >> [10] iterators_1.0.6 >> [11] foreach_1.4.1 >> [12] Biostrings_2.30.1 >> [13] GenomicRanges_1.14.4 >> [14] XVector_0.2.0 >> [15] IRanges_1.20.6 >> [16] lattice_0.20-24 >> [17] Biobase_2.22.0 >> [18] BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 >> [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 >> [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 >> [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 >> [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 >> [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 >> [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 >> [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 >> [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 >> [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Allegra Petti ▴ 120

0

Entering edit mode

One clarification: Jim's interpretation is for qCutoff. The cutoff argument is the actual cutoff. The call you are making says that any difference greater than 0.05 on the M scale is a possible bump. You can see from the output that there more than half the array inside a candidate bump. You probably want qCutoff or set type="Beta" or set cutoff to something bigger. This is really our fault for providing insanely crappy defaults. I will discuss with co-authors and we will change this. However, I am not sure I understand the memory blow up. How many samples do you have? What is the output of print(object.size(getBeta(data.grs)), units = "auto") Kasper On Mon, Feb 24, 2014 at 2:40 PM, Allegra A. Petti <apetti@genome.wustl.edu>wrote: > Ah - thank you very much for the clarification!! I used 0.05 rather > mindlessly because that's what the tutorial used. > > Best, > Allegra > _________________________________ > Allegra A. Petti, Ph.D. > > The Genome Institute > 4444 Forest Park Ave. > St. Louis, MO 63108 > apetti@genome.wustl.edu > allegra.conbrio@post.harvard.edu > > On Feb 24, 2014, at 1:37 PM, "James W. MacDonald" <jmacdon@uw.edu> wrote: > > > Hi Allegra, > > > > > > On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: > >> Dear All, > >> > >> I am trying to identify Differentially Methylated Regions (DMRs) using > a tab-delimited file of beta values obtained with Illumina's 450k > methylation arrays. I am using minfi and its implementation of bumphunter > (development version 1.9.11). > >> > >> However, I run out of memory when I use a mere 100 permutations, and > have tried different amounts of memory to no avail. Of further concern is > the fact that the bumphunter reference (Jaffe, et al (2011) International > J. of Epidemiology: 41:200-209) states that each permutation takes two > hours (see p. 205). At that rate, a standard run with 1000 permutations > would take over 83 days. > >> > >> My questions are as follows (code and output included below): > >> 1. Is there a better way to run minfi/bumphunter, or specific resources > I should allocate? (I am currently using parallel processing with four > cores, and have tried various amounts of memory.) > >> 2. Does minfi/bumphunter include a non-permutation-based method for > estimating statistical significance? Jaffe et al state that they > implemented a second, faster, p-value calculation following the method of > Effron and Tibshirani (see p. 205). Does anyone know if this method is > implemented in minfi? > >> 3. Is there a better method for identifying DMRs? (I am also > experimenting with methyAnalysis, which has nice visualization options, but > (a) is poorly documented and (b) surprisingly found no DMRs in my data set > using default settings.) > >> > >> Many thanks in advance for any insight and/or suggestions! > >> > >> Sincerely, > >> Allegra > >> _________________________________________________________________ > >> Code: > >> > >> # This R script reads in a tab-delimited matrix of Beta values (with a > header line) > >> # and finds differentially-methylated regions using minfi > >> > >> # sample command: bsub -oo beta2dmr.140224.out -e beta2dmr.140224.err > -q long -M 16000000 -R 'select[type==LINUX64 && mem>16000] span[hosts=1] > rusage[mem=16000]' -n 4 -J beta2dmr "~/gsc/R3.0.2/bin/Rscript > beta2dmr_140220.r" > >> > >> library(minfi); # currently requires development version 1.9.11 > >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: had to > be installed manually > >> library(foreach); # used for parallel processing > >> library(doRNG); # is this actually used?? > >> library(doParallel); # "backend" for foreach package > >> registerDoParallel(cores = 4); > >> > >> #betas.frame <- > read.table(file="testbetas.txt",sep="\t",header=TRUE,na.strings=c(NA ),row.names=1,quote=""); > >> betas.frame <- > read.table(file="betas_140220.txt",sep="\t",header=TRUE,na.strings=c (NA),row.names=1,quote=""); > >> betas <- as.matrix(betas.frame); # convert data frame to matrix > >> data.rs <- RatioSet(Beta = betas,annotation=c(array= > "IlluminaHumanMethylation450k", annotation = "ilmn12.hg19")); # create > RatioSet > >> data.grs <- mapToGenomedata.rs); # create GenomicRatioSet > >> sampleNamesdata.rs); # display the sample names (ie column headings) > >> targets <- > read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr .analysis',recursive=FALSE); > # read Sample Sheet > >> samples <- sampleNamesdata.rs); # get sample names (ie column > headers) from input file of beta values > >> j <- match(samples,targets$Sample); # get row indices from sample sheet > that correspond to sample names (column headers) in beta value input file > >> pheno <- targets$Sample_Group[j]; # extract corresponding phenotype > (e.g. "tumor" or "normal") from Sample_Group column of Sample Sheet > >> designMatrix <- model.matrix(~ pheno); # specify design matrix for > regression > >> dmr <- bumphunter(data.grs, design = designMatrix, cutoff = 0.05, > B=100); # run bumphunter with B permutations > > > > That's your problem ^^^^^^^^^^^^^^^^^^^^ > > > > You can interpret the cutoff argument this way; 'Select the bumps that > are larger than the Xth quantile of all bumps in my data set'. By using > 0.05, you are selecting almost all of the bumps in your data, which is not > what you want to do. > > > > Instead, you want to use a cutoff of 0.95 or 0.99, which is 'Select the > bumps that are larger than 95% or 99% of all the bumps in my data'. Or to > put it another way, you want to select the bumps that are big enough that > they are likely to represent differential methylation, and ignore all the > little bumps that likely arise by chance alone. > > > > Best, > > > > Jim > > > > > >> > save(dmr, file="dmr.minfi.100_140222"); > >> > >> __________________________________________________________________ > >> Standard Output: > >> > >> [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" > >> [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" > >> [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" > >> [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" > >> [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" > >> [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" > >> [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" > >> [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" > >> [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" > >> [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" > >> [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" > >> [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" > >> [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" > >> [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" > >> [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" > >> [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" > >> [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" > >> [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" > >> [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" > >> [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" > >> [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" > >> [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" > >> [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" > >> [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" > >> [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" > >> [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" > >> [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" > >> [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" > >> [read.450k.sheet] Found the following CSV files: > >> [1] > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSa mpleSheet.csv" > >> [2] > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sample Sheet140219.csv" > >> > >> ------------------------------------------------------------ > >> Sender: LSF System <lsfadmin@blade14-4-11.gsc.wustl.edu> > >> Subject: Job 5329542: <beta2dmr> Exited > >> > >> Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by > user <apetti> in cluster <lsfcluster1>. > >> Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu>, in queue > <long>, as user <apetti> in cluster <lsfcluster1>. > >> </gscuser> was used as the home directory. > >> > >> __________________________________________________________________ > >> Standard Error: > >> > >> Loading required package: methods > >> Loading required package: BiocGenerics > >> Loading required package: parallel > >> > >> Attaching package: 'BiocGenerics' > >> > >> The following objects are masked from 'package:parallel': > >> > >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > >> clusterExport, clusterMap, parApply, parCapply, parLapply, > >> parLapplyLB, parRapply, parSapply, parSapplyLB > >> > >> The following object is masked from 'package:stats': > >> > >> xtabs > >> > >> The following objects are masked from 'package:base': > >> > >> Filter, Find, Map, Position, Reduce, anyDuplicated, append, > >> as.data.frame, as.vector, cbind, colnames, duplicated, eval, evalq, > >> get, intersect, is.unsorted, lapply, mapply, match, mget, order, > >> paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rep.int, > >> rownames, sapply, setdiff, sort, table, tapply, union, unique, > >> unlist > >> > >> Loading required package: Biobase > >> Welcome to Bioconductor > >> > >> Vignettes contain introductory material; view with > >> 'browseVignettes()'. To cite Bioconductor, see > >> 'citation("Biobase")', and for packages 'citation("pkgname")'. > >> > >> Loading required package: lattice > >> Loading required package: GenomicRanges > >> Loading required package: IRanges > >> Loading required package: XVector > >> Loading required package: Biostrings > >> Loading required package: bumphunter > >> Loading required package: foreach > >> Loading required package: iterators > >> Loading required package: locfit > >> locfit 1.5-9.1 2013-03-22 > >> > >> Attaching package: 'locfit' > >> > >> The following objects are masked from 'package:GenomicRanges': > >> > >> left, right > >> > >> Loading required package: rngtools > >> Loading required package: pkgmaker > >> Loading required package: registry > >> > >> Attaching package: 'pkgmaker' > >> > >> The following object is masked from 'package:IRanges': > >> > >> new2 > >> > >> Warning messages: > >> 1: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > >> Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > >> 2: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > >> Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > >> 3: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > >> Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > >> 4: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > >> Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > >> [bumphunterEngine] Parallelizing using 4 workers/cores (backend: > doParallel, version: 1.0.6). > >> [bumphunterEngine] Computing coefficients. > >> [bumphunterEngine] Performing 100 permutations. > >> [bumphunterEngine] Computing marginal permutation p-values. > >> [bumphunterEngine] cutoff: 0.05 > >> [bumphunterEngine] Finding regions. > >> [bumphunterEngine] Found 233469 bumps. > >> [bumphunterEngine] Computing regions for each permutation. > >> > >> Warning message: > >> In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > >> NAs found and removed. ind changed. > >> Execution halted > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.utf8 LC_COLLATE=C > >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > >> [7] LC_PAPER=en_US.utf8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] doParallel_1.0.6 > >> [2] doRNG_1.5.5 > >> [3] rngtools_1.2.3 > >> [4] pkgmaker_0.17.4 > >> [5] registry_0.2 > >> [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 > >> [7] minfi_1.9.11 > >> [8] bumphunter_1.2.0 > >> [9] locfit_1.5-9.1 > >> [10] iterators_1.0.6 > >> [11] foreach_1.4.1 > >> [12] Biostrings_2.30.1 > >> [13] GenomicRanges_1.14.4 > >> [14] XVector_0.2.0 > >> [15] IRanges_1.20.6 > >> [16] lattice_0.20-24 > >> [17] Biobase_2.22.0 > >> [18] BiocGenerics_0.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 > >> [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 > >> [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 > >> [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 > >> [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 > >> [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 > >> [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 > >> [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 > >> [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > >> [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > ____ > This email message is a private communication. The information > transmitted, including attachments, is intended only for the person or > entity to which it is addressed and may contain confidential, privileged, > and/or proprietary material. Any review, duplication, retransmission, > distribution, or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended recipient > is unauthorized by the sender and is prohibited. If you have received this > message in error, please contact the sender immediately by return email and > delete the original message from all computer systems. Thank you. > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Kasper, Thanks for your message and the further clarification. I will try using the additional parameters you mentioned. To answer your question, I only have 56 samples, and the size of data.grs is 237.1 Mb. I get some additional warning messages that I'm including again (below) in case they are informative. Best, Allegra ______________________________________________________________________ _________ From standard err: Warning messages: 1: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv", : Could not infer array name for file: /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSampl eSheet.csv 2: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv", : Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSampl eSheet.csv 3: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv", : Could not infer array name for file: /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleShe et140219.csv 4: In FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv", : Could not infer slide name for file: /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleShe et140219.csv [bumphunterEngine] Parallelizing using 12 workers/cores (backend: doParallel, version: 1.0.6). [bumphunterEngine] Computing coefficients. [bumphunterEngine] Performing 100 permutations. [bumphunterEngine] Computing marginal permutation p-values. [bumphunterEngine] cutoff: 0.95 [bumphunterEngine] Finding regions. [bumphunterEngine] Found 6278 bumps. [bumphunterEngine] Computing regions for each permutation. Warning messages: 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. 3: In FUN(newX[, i], ...) : NAs found and removed. ind changed. 4: In FUN(newX[, i], ...) : NAs found and removed. ind changed. Warning message: Warning messages: Warning messages: Warning messages: Warning messages: Warning messages: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. Warning messages: beta2dmr.140224.err lines 84-106/115 92% Warning message: Warning messages: Warning messages: Warning messages: Warning messages: Warning messages: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. Warning messages: 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. Warning messages: 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. Execution halted ______________________________________________________________________ _______________ From standard output: ------------------------------------------------------------ Sender: LSF System <lsfadmin@blade14-2-15.gsc.wustl.edu> Subject: Job 5344099: <beta2dmr> Exited Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by user <apetti> in cluster <lsfcluster1>. Job was executed on host(s) <12*blade14-2-15.gsc.wustl.edu>, in queue <long>, as user <apetti> in cluster <lsfcluster1>. </gscuser> was used as the home directory. </gscuser> was used as the working directory. Started at Mon Feb 24 16:28:45 2014 Results reported at Mon Feb 24 16:36:30 2014 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input ~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r ------------------------------------------------------------ TERM_MEMLIMIT: job killed after reaching LSF memory usage limit. Exited with exit code 1. Resource usage summary: CPU time : 762.28 sec. Max Memory : 69236 MB Max Swap : 71263 MB Max Processes : 15 Max Threads : 15 On 02/24/2014 07:00 PM, Kasper Daniel Hansen wrote: > One clarification: Jim's interpretation is for qCutoff. The cutoff > argument is the actual cutoff. The call you are making says that any > difference greater than 0.05 on the M scale is a possible bump. You > can see from the output that there more than half the array inside a > candidate bump. You probably want qCutoff or set type="Beta" or set > cutoff to something bigger. > > This is really our fault for providing insanely crappy defaults. I > will discuss with co-authors and we will change this. > > However, I am not sure I understand the memory blow up. How many > samples do you have? What is the output of > print(object.size(getBeta(data.grs)), units = "auto") > > Kasper > > > On Mon, Feb 24, 2014 at 2:40 PM, Allegra A. Petti > <apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu="">> wrote: > > Ah - thank you very much for the clarification!! I used 0.05 > rather mindlessly because that's what the tutorial used. > > Best, > Allegra > _________________________________ > Allegra A. Petti, Ph.D. > > The Genome Institute > 4444 Forest Park Ave. > St. Louis, MO 63108 > apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu> > allegra.conbrio@post.harvard.edu > <mailto:allegra.conbrio@post.harvard.edu> > > On Feb 24, 2014, at 1:37 PM, "James W. MacDonald" <jmacdon@uw.edu> <mailto:jmacdon@uw.edu>> wrote: > > > Hi Allegra, > > > > > > On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: > >> Dear All, > >> > >> I am trying to identify Differentially Methylated Regions > (DMRs) using a tab-delimited file of beta values obtained with > Illumina's 450k methylation arrays. I am using minfi and its > implementation of bumphunter (development version 1.9.11). > >> > >> However, I run out of memory when I use a mere 100 > permutations, and have tried different amounts of memory to no > avail. Of further concern is the fact that the bumphunter > reference (Jaffe, et al (2011) International J. of Epidemiology: > 41:200-209) states that each permutation takes two hours (see p. > 205). At that rate, a standard run with 1000 permutations would > take over 83 days. > >> > >> My questions are as follows (code and output included below): > >> 1. Is there a better way to run minfi/bumphunter, or specific > resources I should allocate? (I am currently using parallel > processing with four cores, and have tried various amounts of memory.) > >> 2. Does minfi/bumphunter include a non-permutation-based method > for estimating statistical significance? Jaffe et al state that > they implemented a second, faster, p-value calculation following > the method of Effron and Tibshirani (see p. 205). Does anyone know > if this method is implemented in minfi? > >> 3. Is there a better method for identifying DMRs? (I am also > experimenting with methyAnalysis, which has nice visualization > options, but (a) is poorly documented and (b) surprisingly found > no DMRs in my data set using default settings.) > >> > >> Many thanks in advance for any insight and/or suggestions! > >> > >> Sincerely, > >> Allegra > >> _________________________________________________________________ > >> Code: > >> > >> # This R script reads in a tab-delimited matrix of Beta values > (with a header line) > >> # and finds differentially-methylated regions using minfi > >> > >> # sample command: bsub -oo beta2dmr.140224.out -e > beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 > && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr > "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" > >> > >> library(minfi); # currently requires development version 1.9.11 > >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: > had to be installed manually > >> library(foreach); # used for parallel processing > >> library(doRNG); # is this actually used?? > >> library(doParallel); # "backend" for foreach package > >> registerDoParallel(cores = 4); > >> > >> #betas.frame <- > read.table(file="testbetas.txt",sep="\t",header=TRUE,na.strings= c(NA),row.names=1,quote=""); > >> betas.frame <- > read.table(file="betas_140220.txt",sep="\t",header=TRUE,na.strin gs=c(NA),row.names=1,quote=""); > >> betas <- as.matrix(betas.frame); # convert data frame to matrix > >> data.rs <http: data.rs=""> <- RatioSet(Beta = > betas,annotation=c(array= "IlluminaHumanMethylation450k", > annotation = "ilmn12.hg19")); # create RatioSet > >> data.grs <- mapToGenomedata.rs <http: data.rs="">); # create > GenomicRatioSet > >> sampleNamesdata.rs <http: data.rs="">); # display the sample > names (ie column headings) > >> targets <- > read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefractory /dmr.analysis',recursive=FALSE); > # read Sample Sheet > >> samples <- sampleNamesdata.rs <http: data.rs="">); # get sample > names (ie column headers) from input file of beta values > >> j <- match(samples,targets$Sample); # get row indices from > sample sheet that correspond to sample names (column headers) in > beta value input file > >> pheno <- targets$Sample_Group[j]; # extract corresponding > phenotype (e.g. "tumor" or "normal") from Sample_Group column of > Sample Sheet > >> designMatrix <- model.matrix(~ pheno); # specify design matrix > for regression > >> dmr <- bumphunter(data.grs, design = designMatrix, cutoff = > 0.05, B=100); # run bumphunter with B permutations > > > > That's your problem ^^^^^^^^^^^^^^^^^^^^ > > > > You can interpret the cutoff argument this way; 'Select the > bumps that are larger than the Xth quantile of all bumps in my > data set'. By using 0.05, you are selecting almost all of the > bumps in your data, which is not what you want to do. > > > > Instead, you want to use a cutoff of 0.95 or 0.99, which is > 'Select the bumps that are larger than 95% or 99% of all the bumps > in my data'. Or to put it another way, you want to select the > bumps that are big enough that they are likely to represent > differential methylation, and ignore all the little bumps that > likely arise by chance alone. > > > > Best, > > > > Jim > > > > > >> save(dmr, file="dmr.minfi.100_140222"); > >> > >> __________________________________________________________________ > >> Standard Output: > >> > >> [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" > >> [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" > >> [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" > >> [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" > >> [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" > >> [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" > >> [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" > >> [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" > >> [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" > >> [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" > >> [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" > >> [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" > >> [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" > >> [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" > >> [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" > >> [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" > >> [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" > >> [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" > >> [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" > >> [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" > >> [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" > >> [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" > >> [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" > >> [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" > >> [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" > >> [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" > >> [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" > >> [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" > >> [read.450k.sheet] Found the following CSV files: > >> [1] > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv" > >> [2] > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sa mpleSheet140219.csv" > >> > >> ------------------------------------------------------------ > >> Sender: LSF System <lsfadmin@blade14-4-11.gsc.wustl.edu> <mailto:lsfadmin@blade14-4-11.gsc.wustl.edu>> > >> Subject: Job 5329542: <beta2dmr> Exited > >> > >> Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> <http: linus2091.gsc.wustl.edu="">> by user <apetti> in cluster > <lsfcluster1>. > >> Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu > <http: blade14-4-11.gsc.wustl.edu="">>, in queue <long>, as user > <apetti> in cluster <lsfcluster1>. > >> </gscuser> was used as the home directory. > >> > >> __________________________________________________________________ > >> Standard Error: > >> > >> Loading required package: methods > >> Loading required package: BiocGenerics > >> Loading required package: parallel > >> > >> Attaching package: 'BiocGenerics' > >> > >> The following objects are masked from 'package:parallel': > >> > >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > >> clusterExport, clusterMap, parApply, parCapply, parLapply, > >> parLapplyLB, parRapply, parSapply, parSapplyLB > >> > >> The following object is masked from 'package:stats': > >> > >> xtabs > >> > >> The following objects are masked from 'package:base': > >> > >> Filter, Find, Map, Position, Reduce, anyDuplicated, append, > >> as.data.frame, as.vector, cbind, colnames, duplicated, > eval, evalq, > >> get, intersect, is.unsorted, lapply, mapply, match, mget, > order, > >> paste, pmax, pmax.int <http: pmax.int="">, pmin, pmin.int > <http: pmin.int="">, rank, rbind, rep.int <http: rep.int="">, > >> rownames, sapply, setdiff, sort, table, tapply, union, unique, > >> unlist > >> > >> Loading required package: Biobase > >> Welcome to Bioconductor > >> > >> Vignettes contain introductory material; view with > >> 'browseVignettes()'. To cite Bioconductor, see > >> 'citation("Biobase")', and for packages 'citation("pkgname")'. > >> > >> Loading required package: lattice > >> Loading required package: GenomicRanges > >> Loading required package: IRanges > >> Loading required package: XVector > >> Loading required package: Biostrings > >> Loading required package: bumphunter > >> Loading required package: foreach > >> Loading required package: iterators > >> Loading required package: locfit > >> locfit 1.5-9.1 2013-03-22 > >> > >> Attaching package: 'locfit' > >> > >> The following objects are masked from 'package:GenomicRanges': > >> > >> left, right > >> > >> Loading required package: rngtools > >> Loading required package: pkgmaker > >> Loading required package: registry > >> > >> Attaching package: 'pkgmaker' > >> > >> The following object is masked from 'package:IRanges': > >> > >> new2 > >> > >> Warning messages: > >> 1: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > >> Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > >> 2: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > >> Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > >> 3: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > >> Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > >> 4: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > >> Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > >> [bumphunterEngine] Parallelizing using 4 workers/cores > (backend: doParallel, version: 1.0.6). > >> [bumphunterEngine] Computing coefficients. > >> [bumphunterEngine] Performing 100 permutations. > >> [bumphunterEngine] Computing marginal permutation p-values. > >> [bumphunterEngine] cutoff: 0.05 > >> [bumphunterEngine] Finding regions. > >> [bumphunterEngine] Found 233469 bumps. > >> [bumphunterEngine] Computing regions for each permutation. > >> > >> Warning message: > >> In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > >> NAs found and removed. ind changed. > >> Execution halted > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.utf8 LC_COLLATE=C > >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > >> [7] LC_PAPER=en_US.utf8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets > methods > >> [8] base > >> > >> other attached packages: > >> [1] doParallel_1.0.6 > >> [2] doRNG_1.5.5 > >> [3] rngtools_1.2.3 > >> [4] pkgmaker_0.17.4 > >> [5] registry_0.2 > >> [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 > >> [7] minfi_1.9.11 > >> [8] bumphunter_1.2.0 > >> [9] locfit_1.5-9.1 > >> [10] iterators_1.0.6 > >> [11] foreach_1.4.1 > >> [12] Biostrings_2.30.1 > >> [13] GenomicRanges_1.14.4 > >> [14] XVector_0.2.0 > >> [15] IRanges_1.20.6 > >> [16] lattice_0.20-24 > >> [17] Biobase_2.22.0 > >> [18] BiocGenerics_0.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 > >> [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 > >> [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 > >> [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 > >> [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 > >> [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 > >> [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 > >> [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 > >> [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > >> [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > University of Washington > > Environmental and Occupational Health Sciences > > 4225 Roosevelt Way NE, # 100 > > Seattle WA 98105-6099 > > > > ____ > This email message is a private communication. The information > transmitted, including attachments, is intended only for the > person or entity to which it is addressed and may contain > confidential, privileged, and/or proprietary material. Any review, > duplication, retransmission, distribution, or other use of, or > taking of any action in reliance upon, this information by persons > or entities other than the intended recipient is unauthorized by > the sender and is prohibited. If you have received this message in > error, please contact the sender immediately by return email and > delete the original message from all computer systems. Thank you. > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Allegra Petti ▴ 120

0

Entering edit mode

This does not look nice: Warning message: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : NAs found and removed. ind changed. I don't think the warnings Could not infer array name for file: are troublesome, but it is hard to read your output. Do you have the right number of samples in data.grs object? Kasper On Tue, Feb 25, 2014 at 11:35 AM, Allegra Petti <apetti@genome.wustl.edu>wrote: > Kasper, > > Thanks for your message and the further clarification. I will try using > the additional parameters you mentioned. To answer your question, I only > have 56 samples, and the size of data.grs is 237.1 Mb. I get some > additional warning messages that I'm including again (below) in case > they are informative. > > Best, > Allegra > > ____________________________________________________________________ ___________ > From standard err: > > Warning messages: > 1: In > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > Could not infer array name for file: > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > 2: In > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > Could not infer slide name for file: > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > 3: In > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > Could not infer array name for file: > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > 4: In > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > : > Could not infer slide name for file: > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > [bumphunterEngine] Parallelizing using 12 workers/cores (backend: > doParallel, version: 1.0.6). > [bumphunterEngine] Computing coefficients. > [bumphunterEngine] Performing 100 permutations. > [bumphunterEngine] Computing marginal permutation p-values. > [bumphunterEngine] cutoff: 0.95 > [bumphunterEngine] Finding regions. > [bumphunterEngine] Found 6278 bumps. > [bumphunterEngine] Computing regions for each permutation. > > Warning messages: > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > 3: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > 4: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > > > > > > > Warning message: > Warning messages: > Warning messages: > Warning messages: > > Warning messages: > Warning messages: > > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > Warning messages: > beta2dmr.140224.err lines 84-106/115 92% > > > > > > > Warning message: > Warning messages: > Warning messages: > Warning messages: > > Warning messages: > Warning messages: > > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > Warning messages: > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > > > Warning messages: > 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > Execution halted > > ____________________________________________________________________ _________________ > From standard output: > > ------------------------------------------------------------ > Sender: LSF System <lsfadmin@blade14-2-15.gsc.wustl.edu> > Subject: Job 5344099: <beta2dmr> Exited > > Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> by user > <apetti> in cluster <lsfcluster1>. > Job was executed on host(s) <12*blade14-2-15.gsc.wustl.edu>, in queue > <long>, as user <apetti> in cluster <lsfcluster1>. > </gscuser> was used as the home directory. > </gscuser> was used as the working directory. > Started at Mon Feb 24 16:28:45 2014 > Results reported at Mon Feb 24 16:36:30 2014 > > Your job looked like: > > ------------------------------------------------------------ > # LSBATCH: User input > ~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r > ------------------------------------------------------------ > > TERM_MEMLIMIT: job killed after reaching LSF memory usage limit. > Exited with exit code 1. > > Resource usage summary: > > CPU time : 762.28 sec. > Max Memory : 69236 MB > Max Swap : 71263 MB > > Max Processes : 15 > Max Threads : 15 > > > On 02/24/2014 07:00 PM, Kasper Daniel Hansen wrote: > > One clarification: Jim's interpretation is for qCutoff. The cutoff > > argument is the actual cutoff. The call you are making says that any > > difference greater than 0.05 on the M scale is a possible bump. You > > can see from the output that there more than half the array inside a > > candidate bump. You probably want qCutoff or set type="Beta" or set > > cutoff to something bigger. > > > > This is really our fault for providing insanely crappy defaults. I > > will discuss with co-authors and we will change this. > > > > However, I am not sure I understand the memory blow up. How many > > samples do you have? What is the output of > > print(object.size(getBeta(data.grs)), units = "auto") > > > > Kasper > > > > > > On Mon, Feb 24, 2014 at 2:40 PM, Allegra A. Petti > > <apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu="">> wrote: > > > > Ah - thank you very much for the clarification!! I used 0.05 > > rather mindlessly because that's what the tutorial used. > > > > Best, > > Allegra > > _________________________________ > > Allegra A. Petti, Ph.D. > > > > The Genome Institute > > 4444 Forest Park Ave. > > St. Louis, MO 63108 > > apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu> > > allegra.conbrio@post.harvard.edu > > <mailto:allegra.conbrio@post.harvard.edu> > > > > On Feb 24, 2014, at 1:37 PM, "James W. MacDonald" <jmacdon@uw.edu> > <mailto:jmacdon@uw.edu>> wrote: > > > > > Hi Allegra, > > > > > > > > > On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: > > >> Dear All, > > >> > > >> I am trying to identify Differentially Methylated Regions > > (DMRs) using a tab-delimited file of beta values obtained with > > Illumina's 450k methylation arrays. I am using minfi and its > > implementation of bumphunter (development version 1.9.11). > > >> > > >> However, I run out of memory when I use a mere 100 > > permutations, and have tried different amounts of memory to no > > avail. Of further concern is the fact that the bumphunter > > reference (Jaffe, et al (2011) International J. of Epidemiology: > > 41:200-209) states that each permutation takes two hours (see p. > > 205). At that rate, a standard run with 1000 permutations would > > take over 83 days. > > >> > > >> My questions are as follows (code and output included below): > > >> 1. Is there a better way to run minfi/bumphunter, or specific > > resources I should allocate? (I am currently using parallel > > processing with four cores, and have tried various amounts of > memory.) > > >> 2. Does minfi/bumphunter include a non-permutation-based method > > for estimating statistical significance? Jaffe et al state that > > they implemented a second, faster, p-value calculation following > > the method of Effron and Tibshirani (see p. 205). Does anyone know > > if this method is implemented in minfi? > > >> 3. Is there a better method for identifying DMRs? (I am also > > experimenting with methyAnalysis, which has nice visualization > > options, but (a) is poorly documented and (b) surprisingly found > > no DMRs in my data set using default settings.) > > >> > > >> Many thanks in advance for any insight and/or suggestions! > > >> > > >> Sincerely, > > >> Allegra > > >> _________________________________________________________________ > > >> Code: > > >> > > >> # This R script reads in a tab-delimited matrix of Beta values > > (with a header line) > > >> # and finds differentially-methylated regions using minfi > > >> > > >> # sample command: bsub -oo beta2dmr.140224.out -e > > beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 > > && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr > > "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" > > >> > > >> library(minfi); # currently requires development version 1.9.11 > > >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # Note: > > had to be installed manually > > >> library(foreach); # used for parallel processing > > >> library(doRNG); # is this actually used?? > > >> library(doParallel); # "backend" for foreach package > > >> registerDoParallel(cores = 4); > > >> > > >> #betas.frame <- > > > read.table(file="testbetas.txt",sep="\t",header=TRUE,na.strings=c(NA ),row.names=1,quote=""); > > >> betas.frame <- > > > read.table(file="betas_140220.txt",sep="\t",header=TRUE,na.strings=c (NA),row.names=1,quote=""); > > >> betas <- as.matrix(betas.frame); # convert data frame to matrix > > >> data.rs <http: data.rs=""> <- RatioSet(Beta = > > betas,annotation=c(array= "IlluminaHumanMethylation450k", > > annotation = "ilmn12.hg19")); # create RatioSet > > >> data.grs <- mapToGenomedata.rs <http: data.rs="">); # create > > GenomicRatioSet > > >> sampleNamesdata.rs <http: data.rs="">); # display the sample > > names (ie column headings) > > >> targets <- > > > read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr .analysis',recursive=FALSE); > > # read Sample Sheet > > >> samples <- sampleNamesdata.rs <http: data.rs="">); # get sample > > names (ie column headers) from input file of beta values > > >> j <- match(samples,targets$Sample); # get row indices from > > sample sheet that correspond to sample names (column headers) in > > beta value input file > > >> pheno <- targets$Sample_Group[j]; # extract corresponding > > phenotype (e.g. "tumor" or "normal") from Sample_Group column of > > Sample Sheet > > >> designMatrix <- model.matrix(~ pheno); # specify design matrix > > for regression > > >> dmr <- bumphunter(data.grs, design = designMatrix, cutoff = > > 0.05, B=100); # run bumphunter with B permutations > > > > > > That's your problem ^^^^^^^^^^^^^^^^^^^^ > > > > > > You can interpret the cutoff argument this way; 'Select the > > bumps that are larger than the Xth quantile of all bumps in my > > data set'. By using 0.05, you are selecting almost all of the > > bumps in your data, which is not what you want to do. > > > > > > Instead, you want to use a cutoff of 0.95 or 0.99, which is > > 'Select the bumps that are larger than 95% or 99% of all the bumps > > in my data'. Or to put it another way, you want to select the > > bumps that are big enough that they are likely to represent > > differential methylation, and ignore all the little bumps that > > likely arise by chance alone. > > > > > > Best, > > > > > > Jim > > > > > > > > >> save(dmr, file="dmr.minfi.100_140222"); > > >> > > >> __________________________________________________________________ > > >> Standard Output: > > >> > > >> [1] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" > > >> [3] "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" > > >> [5] "TCGA.AB.2934.03A.01D.0743.05" "TCGA.AB.2807.03A.01D.0741.05" > > >> [7] "TCGA.AB.2809.03A.01D.0741.05" "TCGA.AB.2984.03A.01D.0742.05" > > >> [9] "TCGA.AB.2921.03A.01D.0743.05" "TCGA.AB.2952.03A.01D.0742.05" > > >> [11] "TCGA.AB.2829.03A.01D.0741.05" "TCGA.AB.2901.03A.01D.0742.05" > > >> [13] "TCGA.AB.2827.03A.01D.0741.05" "TCGA.AB.2884.03A.01D.0742.05" > > >> [15] "TCGA.AB.3011.03A.01D.0742.05" "TCGA.AB.2878.03A.01D.0742.05" > > >> [17] "TCGA.AB.2933.03A.01D.0742.05" "TCGA.AB.2843.03A.01D.0741.05" > > >> [19] "TCGA.AB.2814.03A.01D.0741.05" "TCGA.AB.2856.03A.01D.0741.05" > > >> [21] "TCGA.AB.2855.03A.01D.0741.05" "TCGA.AB.2919.03A.01D.0743.05" > > >> [23] "TCGA.AB.2967.03A.01D.0742.05" "TCGA.AB.2817.03A.01D.0742.05" > > >> [25] "TCGA.AB.2887.03A.01D.0742.05" "TCGA.AB.2894.03A.01D.0742.05" > > >> [27] "TCGA.AB.2895.03A.01D.0742.05" "TCGA.AB.2947.03A.01D.0743.05" > > >> [29] "TCGA.AB.2930.03A.01D.0743.05" "TCGA.AB.2896.03A.01D.0742.05" > > >> [31] "TCGA.AB.2983.03A.01D.0742.05" "TCGA.AB.2945.03A.01D.0741.05" > > >> [33] "TCGA.AB.2918.03A.01D.0743.05" "TCGA.AB.2928.03A.01D.0743.05" > > >> [35] "TCGA.AB.2979.03A.01D.0742.05" "TCGA.AB.2969.03A.01D.0741.05" > > >> [37] "TCGA.AB.2891.03A.01D.0742.05" "TCGA.AB.2971.03A.01D.0741.05" > > >> [39] "TCGA.AB.2964.03A.01D.0741.05" "TCGA.AB.2835.03A.01D.0741.05" > > >> [41] "TCGA.AB.3002.03A.01D.0742.05" "TCGA.AB.2853.03A.01D.0741.05" > > >> [43] "TCGA.AB.2836.03A.01D.0741.05" "TCGA.AB.2909.03A.01D.0743.05" > > >> [45] "TCGA.AB.2932.03A.01D.0743.05" "TCGA.AB.2926.03A.01D.0742.05" > > >> [47] "TCGA.AB.2826.03A.01D.0741.05" "TCGA.AB.2911.03A.01D.0741.05" > > >> [49] "TCGA.AB.2920.03A.01D.0742.05" "TCGA.AB.2917.03A.01D.0741.05" > > >> [51] "TCGA.AB.2869.03A.01D.0742.05" "TCGA.AB.2873.03A.01D.0742.05" > > >> [53] "TCGA.AB.2879.03A.01D.0742.05" "TCGA.AB.2831.03A.01D.0741.05" > > >> [55] "TCGA.AB.2816.03A.01D.0742.05" "TCGA.AB.2990.03A.01D.0742.05" > > >> [read.450k.sheet] Found the following CSV files: > > >> [1] > > > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSa mpleSheet.csv" > > >> [2] > > > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sample Sheet140219.csv" > > >> > > >> ------------------------------------------------------------ > > >> Sender: LSF System <lsfadmin@blade14-4-11.gsc.wustl.edu> > <mailto:lsfadmin@blade14-4-11.gsc.wustl.edu>> > > >> Subject: Job 5329542: <beta2dmr> Exited > > >> > > >> Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> > <http: linus2091.gsc.wustl.edu="">> by user <apetti> in cluster > > <lsfcluster1>. > > >> Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu > > <http: blade14-4-11.gsc.wustl.edu="">>, in queue <long>, as user > > <apetti> in cluster <lsfcluster1>. > > >> </gscuser> was used as the home directory. > > >> > > >> __________________________________________________________________ > > >> Standard Error: > > >> > > >> Loading required package: methods > > >> Loading required package: BiocGenerics > > >> Loading required package: parallel > > >> > > >> Attaching package: 'BiocGenerics' > > >> > > >> The following objects are masked from 'package:parallel': > > >> > > >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > > >> clusterExport, clusterMap, parApply, parCapply, parLapply, > > >> parLapplyLB, parRapply, parSapply, parSapplyLB > > >> > > >> The following object is masked from 'package:stats': > > >> > > >> xtabs > > >> > > >> The following objects are masked from 'package:base': > > >> > > >> Filter, Find, Map, Position, Reduce, anyDuplicated, append, > > >> as.data.frame, as.vector, cbind, colnames, duplicated, > > eval, evalq, > > >> get, intersect, is.unsorted, lapply, mapply, match, mget, > > order, > > >> paste, pmax, pmax.int <http: pmax.int="">, pmin, pmin.int > > <http: pmin.int="">, rank, rbind, rep.int <http: rep.int="">, > > >> rownames, sapply, setdiff, sort, table, tapply, union, unique, > > >> unlist > > >> > > >> Loading required package: Biobase > > >> Welcome to Bioconductor > > >> > > >> Vignettes contain introductory material; view with > > >> 'browseVignettes()'. To cite Bioconductor, see > > >> 'citation("Biobase")', and for packages 'citation("pkgname")'. > > >> > > >> Loading required package: lattice > > >> Loading required package: GenomicRanges > > >> Loading required package: IRanges > > >> Loading required package: XVector > > >> Loading required package: Biostrings > > >> Loading required package: bumphunter > > >> Loading required package: foreach > > >> Loading required package: iterators > > >> Loading required package: locfit > > >> locfit 1.5-9.1 2013-03-22 > > >> > > >> Attaching package: 'locfit' > > >> > > >> The following objects are masked from 'package:GenomicRanges': > > >> > > >> left, right > > >> > > >> Loading required package: rngtools > > >> Loading required package: pkgmaker > > >> Loading required package: registry > > >> > > >> Attaching package: 'pkgmaker' > > >> > > >> The following object is masked from 'package:IRanges': > > >> > > >> new2 > > >> > > >> Warning messages: > > >> 1: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > > : > > >> Could not infer array name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > > >> 2: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > > : > > >> Could not infer slide name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/IdatSam pleSheet.csv > > >> 3: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > > : > > >> Could not infer array name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > > >> 4: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/ IdatSampleSheet.csv", > > : > > >> Could not infer slide name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/SampleS heet140219.csv > > >> [bumphunterEngine] Parallelizing using 4 workers/cores > > (backend: doParallel, version: 1.0.6). > > >> [bumphunterEngine] Computing coefficients. > > >> [bumphunterEngine] Performing 100 permutations. > > >> [bumphunterEngine] Computing marginal permutation p-values. > > >> [bumphunterEngine] cutoff: 0.05 > > >> [bumphunterEngine] Finding regions. > > >> [bumphunterEngine] Found 233469 bumps. > > >> [bumphunterEngine] Computing regions for each permutation. > > >> > > >> Warning message: > > >> In regionFinder(x = beta, chr = chr, pos = pos, cluster = > > cluster, : > > >> NAs found and removed. ind changed. > > >> Execution halted > > >> > > >> > > >> -- output of sessionInfo(): > > >> > > >> R version 3.0.2 (2013-09-25) > > >> Platform: x86_64-unknown-linux-gnu (64-bit) > > >> > > >> locale: > > >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > > >> [3] LC_TIME=en_US.utf8 LC_COLLATE=C > > >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > > >> [7] LC_PAPER=en_US.utf8 LC_NAME=C > > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > > >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > >> > > >> attached base packages: > > >> [1] parallel stats graphics grDevices utils datasets > > methods > > >> [8] base > > >> > > >> other attached packages: > > >> [1] doParallel_1.0.6 > > >> [2] doRNG_1.5.5 > > >> [3] rngtools_1.2.3 > > >> [4] pkgmaker_0.17.4 > > >> [5] registry_0.2 > > >> [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 > > >> [7] minfi_1.9.11 > > >> [8] bumphunter_1.2.0 > > >> [9] locfit_1.5-9.1 > > >> [10] iterators_1.0.6 > > >> [11] foreach_1.4.1 > > >> [12] Biostrings_2.30.1 > > >> [13] GenomicRanges_1.14.4 > > >> [14] XVector_0.2.0 > > >> [15] IRanges_1.20.6 > > >> [16] lattice_0.20-24 > > >> [17] Biobase_2.22.0 > > >> [18] BiocGenerics_0.8.0 > > >> > > >> loaded via a namespace (and not attached): > > >> [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 > > >> [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 > > >> [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 > > >> [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 > > >> [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 > > >> [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 > > >> [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 > > >> [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 > > >> [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > > >> [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 > > >> > > >> -- > > >> Sent via the guest posting facility at bioconductor.org > > <http: bioconductor.org="">. > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > ____ > > This email message is a private communication. The information > > transmitted, including attachments, is intended only for the > > person or entity to which it is addressed and may contain > > confidential, privileged, and/or proprietary material. Any review, > > duplication, retransmission, distribution, or other use of, or > > taking of any action in reliance upon, this information by persons > > or entities other than the intended recipient is unauthorized by > > the sender and is prohibited. If you have received this message in > > error, please contact the sender immediately by return email and > > delete the original message from all computer systems. Thank you. > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > ____ > This email message is a private communication. The information > transmitted, including attachments, is intended only for the person or > entity to which it is addressed and may contain confidential, privileged, > and/or proprietary material. Any review, duplication, retransmission, > distribution, or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended recipient > is unauthorized by the sender and is prohibited. If you have received this > message in error, please contact the sender immediately by return email and > delete the original message from all computer systems. Thank you. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

data.grs seems to have the correct number of samples (56). I'm including the output of str(data.grs), but please let me know if there's anything else I should be looking at. (As I may have mentioned, I am new to R.) > str(data.grs) Formal class 'GenomicRatioSet' [package "minfi"] with 6 slots ..@ preprocessMethod: chr(0) ..@ annotation : Named chr [1:2] "IlluminaHumanMethylation450k" "ilmn12.hg19" .. ..- attr(*, "names")= chr [1:2] "array" "annotation" ..@ exptData :Formal class 'SimpleList' [package "IRanges"] with 4 slots .. .. ..@ listData : list() .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ rowData :Formal class 'GRanges' [package "GenomicRanges"] with 6 slots .. .. ..@ seqnames :Formal class 'Rle' [package "IRanges"] with 4 slots .. .. .. .. ..@ values : Factor w/ 24 levels "chr1","chr2",..: 1 2 3 4 5 6 7 8 9 10 ... .. .. .. .. ..@ lengths : int [1:24] 46857 34810 25159 20464 24327 36611 30017 20950 9861 24388 ... .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. .. .. ..@ start : int [1:485512] 15865 18827 29407 29425 29435 68849 69591 91550 135252 449076 ... .. .. .. .. ..@ width : int [1:485512] 1 1 1 1 1 1 1 1 1 1 ... .. .. .. .. ..@ NAMES : chr [1:485512] "cg13869341" "cg14008030" "cg12045430" "cg20826792" ... .. .. .. .. ..@ elementType : chr "integer" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ strand :Formal class 'Rle' [package "IRanges"] with 4 slots .. .. .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3 .. .. .. .. ..@ lengths : int 485512 .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "IRanges"] with 6 slots .. .. .. .. ..@ rownames : NULL .. .. .. .. ..@ nrows : int 485512 .. .. .. .. ..@ listData : Named list() .. .. .. .. ..@ elementType : chr "ANY" .. .. .. .. ..@ elementMetadata: NULL .. .. .. .. ..@ metadata : list() .. .. ..@ seqinfo :Formal class 'Seqinfo' [package "GenomicRanges"] with 4 slots .. .. .. .. ..@ seqnames : chr [1:24] "chr1" "chr2" "chr3" "chr4" ... .. .. .. .. ..@ seqlengths : int [1:24] NA NA NA NA NA NA NA NA NA NA ... .. .. .. .. ..@ is_circular: logi [1:24] NA NA NA NA NA NA ... .. .. .. .. ..@ genome : chr [1:24] "hg19" "hg19" "hg19" "hg19" ... .. .. ..@ metadata : list() ..@ colData :Formal class 'DataFrame' [package "IRanges"] with 6 slots .. .. ..@ rownames : chr [1:56] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" ... .. .. ..@ nrows : int 56 .. .. ..@ listData : Named list() .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ assays :Reference class 'ShallowSimpleListAssays' [package "GenomicRanges"] with 1 fields .. ..$ data:Formal class 'SimpleList' [package "IRanges"] with 4 slots .. .. .. ..@ listData :List of 2 .. .. .. .. ..$ Beta: num [1:485512, 1:56] 0.915 0.745 0.041 0.132 NA ... .. .. .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. .. .. ..$ : chr [1:485512] "cg13869341" "cg14008030" "cg12045430" "cg20826792" ... .. .. .. .. .. .. ..$ : chr [1:56] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" ... .. .. .. .. ..$ M : num [1:485512, 1:56] 3.42 1.55 -4.55 -2.71 NA ... .. .. .. .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. .. .. .. ..$ : chr [1:485512] "cg13869341" "cg14008030" "cg12045430" "cg20826792" ... .. .. .. .. .. .. ..$ : chr [1:56] "TCGA.AB.2940.03A.01D.0742.05" "TCGA.AB.2944.03A.01D.0743.05" "TCGA.AB.2866.03A.01D.0741.05" "TCGA.AB.2871.03A.01D.0742.05" ... .. .. .. ..@ elementType : chr "ANY" .. .. .. ..@ elementMetadata: NULL .. .. .. ..@ metadata : list() .. ..and 12 methods,> On 02/25/2014 10:43 AM, Kasper Daniel Hansen wrote: > This does not look nice: > > Warning message: > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > > I don't think the warnings > > Could not infer array name for file: > > are troublesome, but it is hard to read your output. > > Do you have the right number of samples in data.grs object? > > Kasper > > > On Tue, Feb 25, 2014 at 11:35 AM, Allegra Petti > <apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu="">> wrote: > > Kasper, > > Thanks for your message and the further clarification. I will try > using > the additional parameters you mentioned. To answer your question, > I only > have 56 samples, and the size of data.grs is 237.1 Mb. I get some > additional warning messages that I'm including again (below) in case > they are informative. > > Best, > Allegra > ________________________________________________________________ _______________ > From standard err: > > Warning messages: > 1: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > 2: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > 3: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > Could not infer array name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > 4: In > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > : > Could not infer slide name for file: > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > [bumphunterEngine] Parallelizing using 12 workers/cores (backend: > doParallel, version: 1.0.6). > [bumphunterEngine] Computing coefficients. > [bumphunterEngine] Performing 100 permutations. > [bumphunterEngine] Computing marginal permutation p-values. > [bumphunterEngine] cutoff: 0.95 > [bumphunterEngine] Finding regions. > [bumphunterEngine] Found 6278 bumps. > [bumphunterEngine] Computing regions for each permutation. > > Warning messages: > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > 3: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > 4: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > > > > > > > Warning message: > Warning messages: > Warning messages: > Warning messages: > > Warning messages: > Warning messages: > > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > Warning messages: > beta2dmr.140224.err lines 84-106/115 92% > > > > > > > Warning message: > Warning messages: > Warning messages: > Warning messages: > > Warning messages: > Warning messages: > > In regionFinder(x = beta, chr = chr, pos = pos, cluster = cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > Warning messages: > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > 1: In regionFinder(x = beta, chr = chr, pos = pos, cluster = > cluster, : > NAs found and removed. ind changed. > > > Warning messages: > 2: In FUN(newX[, i], ...) : NAs found and removed. ind changed. > Execution halted > ________________________________________________________________ _____________________ > From standard output: > > ------------------------------------------------------------ > Sender: LSF System <lsfadmin@blade14-2-15.gsc.wustl.edu> <mailto:lsfadmin@blade14-2-15.gsc.wustl.edu>> > Subject: Job 5344099: <beta2dmr> Exited > > Job <beta2dmr> was submitted from host <linus2091.gsc.wustl.edu> <http: linus2091.gsc.wustl.edu="">> by user > <apetti> in cluster <lsfcluster1>. > Job was executed on host(s) <12*blade14-2-15.gsc.wustl.edu > <http: blade14-2-15.gsc.wustl.edu="">>, in queue > <long>, as user <apetti> in cluster <lsfcluster1>. > </gscuser> was used as the home directory. > </gscuser> was used as the working > directory. > Started at Mon Feb 24 16:28:45 2014 > Results reported at Mon Feb 24 16:36:30 2014 > > Your job looked like: > > ------------------------------------------------------------ > # LSBATCH: User input > ~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r > ------------------------------------------------------------ > > TERM_MEMLIMIT: job killed after reaching LSF memory usage limit. > Exited with exit code 1. > > Resource usage summary: > > CPU time : 762.28 sec. > Max Memory : 69236 MB > Max Swap : 71263 MB > > Max Processes : 15 > Max Threads : 15 > > > On 02/24/2014 07:00 PM, Kasper Daniel Hansen wrote: > > One clarification: Jim's interpretation is for qCutoff. The cutoff > > argument is the actual cutoff. The call you are making says that any > > difference greater than 0.05 on the M scale is a possible bump. You > > can see from the output that there more than half the array inside a > > candidate bump. You probably want qCutoff or set type="Beta" or set > > cutoff to something bigger. > > > > This is really our fault for providing insanely crappy defaults. I > > will discuss with co-authors and we will change this. > > > > However, I am not sure I understand the memory blow up. How many > > samples do you have? What is the output of > > print(object.size(getBeta(data.grs)), units = "auto") > > > > Kasper > > > > > > On Mon, Feb 24, 2014 at 2:40 PM, Allegra A. Petti > > <apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu=""> > <mailto:apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu="">>> > wrote: > > > > Ah - thank you very much for the clarification!! I used 0.05 > > rather mindlessly because that's what the tutorial used. > > > > Best, > > Allegra > > _________________________________ > > Allegra A. Petti, Ph.D. > > > > The Genome Institute > > 4444 Forest Park Ave. > > St. Louis, MO 63108 > > apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu> > <mailto:apetti@genome.wustl.edu <mailto:apetti@genome.wustl.edu="">> > > allegra.conbrio@post.harvard.edu > <mailto:allegra.conbrio@post.harvard.edu> > > <mailto:allegra.conbrio@post.harvard.edu> <mailto:allegra.conbrio@post.harvard.edu>> > > > > On Feb 24, 2014, at 1:37 PM, "James W. MacDonald" > <jmacdon@uw.edu <mailto:jmacdon@uw.edu=""> > > <mailto:jmacdon@uw.edu <mailto:jmacdon@uw.edu="">>> wrote: > > > > > Hi Allegra, > > > > > > > > > On 2/24/2014 1:41 PM, Allegra Petti [guest] wrote: > > >> Dear All, > > >> > > >> I am trying to identify Differentially Methylated Regions > > (DMRs) using a tab-delimited file of beta values obtained with > > Illumina's 450k methylation arrays. I am using minfi and its > > implementation of bumphunter (development version 1.9.11). > > >> > > >> However, I run out of memory when I use a mere 100 > > permutations, and have tried different amounts of memory to no > > avail. Of further concern is the fact that the bumphunter > > reference (Jaffe, et al (2011) International J. of Epidemiology: > > 41:200-209) states that each permutation takes two hours (see p. > > 205). At that rate, a standard run with 1000 permutations would > > take over 83 days. > > >> > > >> My questions are as follows (code and output included below): > > >> 1. Is there a better way to run minfi/bumphunter, or specific > > resources I should allocate? (I am currently using parallel > > processing with four cores, and have tried various amounts > of memory.) > > >> 2. Does minfi/bumphunter include a non-permutation- based > method > > for estimating statistical significance? Jaffe et al state that > > they implemented a second, faster, p-value calculation following > > the method of Effron and Tibshirani (see p. 205). Does > anyone know > > if this method is implemented in minfi? > > >> 3. Is there a better method for identifying DMRs? (I am also > > experimenting with methyAnalysis, which has nice visualization > > options, but (a) is poorly documented and (b) surprisingly found > > no DMRs in my data set using default settings.) > > >> > > >> Many thanks in advance for any insight and/or suggestions! > > >> > > >> Sincerely, > > >> Allegra > > >> > _________________________________________________________________ > > >> Code: > > >> > > >> # This R script reads in a tab-delimited matrix of Beta > values > > (with a header line) > > >> # and finds differentially-methylated regions using minfi > > >> > > >> # sample command: bsub -oo beta2dmr.140224.out -e > > beta2dmr.140224.err -q long -M 16000000 -R 'select[type==LINUX64 > > && mem>16000] span[hosts=1] rusage[mem=16000]' -n 4 -J beta2dmr > > "~/gsc/R3.0.2/bin/Rscript beta2dmr_140220.r" > > >> > > >> library(minfi); # currently requires development version > 1.9.11 > > >> library(IlluminaHumanMethylation450kanno.ilmn12.hg19); # > Note: > > had to be installed manually > > >> library(foreach); # used for parallel processing > > >> library(doRNG); # is this actually used?? > > >> library(doParallel); # "backend" for foreach package > > >> registerDoParallel(cores = 4); > > >> > > >> #betas.frame <- > > > read.table(file="testbetas.txt",sep="\t",header=TRUE,na.strings= c(NA),row.names=1,quote=""); > > >> betas.frame <- > > > read.table(file="betas_140220.txt",sep="\t",header=TRUE,na.strin gs=c(NA),row.names=1,quote=""); > > >> betas <- as.matrix(betas.frame); # convert data frame to > matrix > > >> data.rs <http: data.rs=""> <http: data.rs=""> <- RatioSet(Beta = > > betas,annotation=c(array= "IlluminaHumanMethylation450k", > > annotation = "ilmn12.hg19")); # create RatioSet > > >> data.grs <- mapToGenomedata.rs <http: data.rs=""> > <http: data.rs="">); # create > > GenomicRatioSet > > >> sampleNamesdata.rs <http: data.rs=""> <http: data.rs="">); # > display the sample > > names (ie column headings) > > >> targets <- > > > read.450k.sheet('/gscmnt/gc3016/info/medseq/apetti/AMLrefractory /dmr.analysis',recursive=FALSE); > > # read Sample Sheet > > >> samples <- sampleNamesdata.rs <http: data.rs=""> > <http: data.rs="">); # get sample > > names (ie column headers) from input file of beta values > > >> j <- match(samples,targets$Sample); # get row indices from > > sample sheet that correspond to sample names (column headers) in > > beta value input file > > >> pheno <- targets$Sample_Group[j]; # extract corresponding > > phenotype (e.g. "tumor" or "normal") from Sample_Group column of > > Sample Sheet > > >> designMatrix <- model.matrix(~ pheno); # specify design > matrix > > for regression > > >> dmr <- bumphunter(data.grs, design = designMatrix, cutoff = > > 0.05, B=100); # run bumphunter with B permutations > > > > > > That's your problem ^^^^^^^^^^^^^^^^^^^^ > > > > > > You can interpret the cutoff argument this way; 'Select the > > bumps that are larger than the Xth quantile of all bumps in my > > data set'. By using 0.05, you are selecting almost all of the > > bumps in your data, which is not what you want to do. > > > > > > Instead, you want to use a cutoff of 0.95 or 0.99, which is > > 'Select the bumps that are larger than 95% or 99% of all the > bumps > > in my data'. Or to put it another way, you want to select the > > bumps that are big enough that they are likely to represent > > differential methylation, and ignore all the little bumps that > > likely arise by chance alone. > > > > > > Best, > > > > > > Jim > > > > > > > > >> save(dmr, file="dmr.minfi.100_140222"); > > >> > > >> > __________________________________________________________________ > > >> Standard Output: > > >> > > >> [1] "TCGA.AB.2940.03A.01D.0742.05" > "TCGA.AB.2944.03A.01D.0743.05" > > >> [3] "TCGA.AB.2866.03A.01D.0741.05" > "TCGA.AB.2871.03A.01D.0742.05" > > >> [5] "TCGA.AB.2934.03A.01D.0743.05" > "TCGA.AB.2807.03A.01D.0741.05" > > >> [7] "TCGA.AB.2809.03A.01D.0741.05" > "TCGA.AB.2984.03A.01D.0742.05" > > >> [9] "TCGA.AB.2921.03A.01D.0743.05" > "TCGA.AB.2952.03A.01D.0742.05" > > >> [11] "TCGA.AB.2829.03A.01D.0741.05" > "TCGA.AB.2901.03A.01D.0742.05" > > >> [13] "TCGA.AB.2827.03A.01D.0741.05" > "TCGA.AB.2884.03A.01D.0742.05" > > >> [15] "TCGA.AB.3011.03A.01D.0742.05" > "TCGA.AB.2878.03A.01D.0742.05" > > >> [17] "TCGA.AB.2933.03A.01D.0742.05" > "TCGA.AB.2843.03A.01D.0741.05" > > >> [19] "TCGA.AB.2814.03A.01D.0741.05" > "TCGA.AB.2856.03A.01D.0741.05" > > >> [21] "TCGA.AB.2855.03A.01D.0741.05" > "TCGA.AB.2919.03A.01D.0743.05" > > >> [23] "TCGA.AB.2967.03A.01D.0742.05" > "TCGA.AB.2817.03A.01D.0742.05" > > >> [25] "TCGA.AB.2887.03A.01D.0742.05" > "TCGA.AB.2894.03A.01D.0742.05" > > >> [27] "TCGA.AB.2895.03A.01D.0742.05" > "TCGA.AB.2947.03A.01D.0743.05" > > >> [29] "TCGA.AB.2930.03A.01D.0743.05" > "TCGA.AB.2896.03A.01D.0742.05" > > >> [31] "TCGA.AB.2983.03A.01D.0742.05" > "TCGA.AB.2945.03A.01D.0741.05" > > >> [33] "TCGA.AB.2918.03A.01D.0743.05" > "TCGA.AB.2928.03A.01D.0743.05" > > >> [35] "TCGA.AB.2979.03A.01D.0742.05" > "TCGA.AB.2969.03A.01D.0741.05" > > >> [37] "TCGA.AB.2891.03A.01D.0742.05" > "TCGA.AB.2971.03A.01D.0741.05" > > >> [39] "TCGA.AB.2964.03A.01D.0741.05" > "TCGA.AB.2835.03A.01D.0741.05" > > >> [41] "TCGA.AB.3002.03A.01D.0742.05" > "TCGA.AB.2853.03A.01D.0741.05" > > >> [43] "TCGA.AB.2836.03A.01D.0741.05" > "TCGA.AB.2909.03A.01D.0743.05" > > >> [45] "TCGA.AB.2932.03A.01D.0743.05" > "TCGA.AB.2926.03A.01D.0742.05" > > >> [47] "TCGA.AB.2826.03A.01D.0741.05" > "TCGA.AB.2911.03A.01D.0741.05" > > >> [49] "TCGA.AB.2920.03A.01D.0742.05" > "TCGA.AB.2917.03A.01D.0741.05" > > >> [51] "TCGA.AB.2869.03A.01D.0742.05" > "TCGA.AB.2873.03A.01D.0742.05" > > >> [53] "TCGA.AB.2879.03A.01D.0742.05" > "TCGA.AB.2831.03A.01D.0741.05" > > >> [55] "TCGA.AB.2816.03A.01D.0742.05" > "TCGA.AB.2990.03A.01D.0742.05" > > >> [read.450k.sheet] Found the following CSV files: > > >> [1] > > > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Id atSampleSheet.csv" > > >> [2] > > > "/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sa mpleSheet140219.csv" > > >> > > >> ------------------------------------------------------------ > > >> Sender: LSF System <lsfadmin@blade14-4-11.gsc.wustl.edu> <mailto:lsfadmin@blade14-4-11.gsc.wustl.edu> > > <mailto:lsfadmin@blade14-4-11.gsc.wustl.edu> <mailto:lsfadmin@blade14-4-11.gsc.wustl.edu>>> > > >> Subject: Job 5329542: <beta2dmr> Exited > > >> > > >> Job <beta2dmr> was submitted from host > <linus2091.gsc.wustl.edu <http:="" linus2091.gsc.wustl.edu=""> > > <http: linus2091.gsc.wustl.edu="">> by user <apetti> in cluster > > <lsfcluster1>. > > >> Job was executed on host(s) <4*blade14-4-11.gsc.wustl.edu > <http: blade14-4-11.gsc.wustl.edu=""> > > <http: blade14-4-11.gsc.wustl.edu="">>, in queue <long>, as user > > <apetti> in cluster <lsfcluster1>. > > >> </gscuser> was used as the home directory. > > >> > > >> > __________________________________________________________________ > > >> Standard Error: > > >> > > >> Loading required package: methods > > >> Loading required package: BiocGenerics > > >> Loading required package: parallel > > >> > > >> Attaching package: 'BiocGenerics' > > >> > > >> The following objects are masked from 'package:parallel': > > >> > > >> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > > >> clusterExport, clusterMap, parApply, parCapply, > parLapply, > > >> parLapplyLB, parRapply, parSapply, parSapplyLB > > >> > > >> The following object is masked from 'package:stats': > > >> > > >> xtabs > > >> > > >> The following objects are masked from 'package:base': > > >> > > >> Filter, Find, Map, Position, Reduce, anyDuplicated, > append, > > >> as.data.frame, as.vector, cbind, colnames, duplicated, > > eval, evalq, > > >> get, intersect, is.unsorted, lapply, mapply, match, mget, > > order, > > >> paste, pmax, pmax.int <http: pmax.int=""> > <http: pmax.int="">, pmin, pmin.int <http: pmin.int=""> > > <http: pmin.int="">, rank, rbind, rep.int <http: rep.int=""> > <http: rep.int="">, > > >> rownames, sapply, setdiff, sort, table, tapply, > union, unique, > > >> unlist > > >> > > >> Loading required package: Biobase > > >> Welcome to Bioconductor > > >> > > >> Vignettes contain introductory material; view with > > >> 'browseVignettes()'. To cite Bioconductor, see > > >> 'citation("Biobase")', and for packages > 'citation("pkgname")'. > > >> > > >> Loading required package: lattice > > >> Loading required package: GenomicRanges > > >> Loading required package: IRanges > > >> Loading required package: XVector > > >> Loading required package: Biostrings > > >> Loading required package: bumphunter > > >> Loading required package: foreach > > >> Loading required package: iterators > > >> Loading required package: locfit > > >> locfit 1.5-9.1 2013-03-22 > > >> > > >> Attaching package: 'locfit' > > >> > > >> The following objects are masked from > 'package:GenomicRanges': > > >> > > >> left, right > > >> > > >> Loading required package: rngtools > > >> Loading required package: pkgmaker > > >> Loading required package: registry > > >> > > >> Attaching package: 'pkgmaker' > > >> > > >> The following object is masked from 'package:IRanges': > > >> > > >> new2 > > >> > > >> Warning messages: > > >> 1: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > > : > > >> Could not infer array name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > > >> 2: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > > : > > >> Could not infer slide name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Ida tSampleSheet.csv > > >> 3: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > > : > > >> Could not infer array name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > > >> 4: In > > > FUN(c("/gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analy sis/IdatSampleSheet.csv", > > : > > >> Could not infer slide name for file: > > > /gscmnt/gc3016/info/medseq/apetti/AMLrefractory/dmr.analysis/Sam pleSheet140219.csv > > >> [bumphunterEngine] Parallelizing using 4 workers/cores > > (backend: doParallel, version: 1.0.6). > > >> [bumphunterEngine] Computing coefficients. > > >> [bumphunterEngine] Performing 100 permutations. > > >> [bumphunterEngine] Computing marginal permutation p-values. > > >> [bumphunterEngine] cutoff: 0.05 > > >> [bumphunterEngine] Finding regions. > > >> [bumphunterEngine] Found 233469 bumps. > > >> [bumphunterEngine] Computing regions for each permutation. > > >> > > >> Warning message: > > >> In regionFinder(x = beta, chr = chr, pos = pos, cluster = > > cluster, : > > >> NAs found and removed. ind changed. > > >> Execution halted > > >> > > >> > > >> -- output of sessionInfo(): > > >> > > >> R version 3.0.2 (2013-09-25) > > >> Platform: x86_64-unknown-linux-gnu (64-bit) > > >> > > >> locale: > > >> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > > >> [3] LC_TIME=en_US.utf8 LC_COLLATE=C > > >> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 > > >> [7] LC_PAPER=en_US.utf8 LC_NAME=C > > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > > >> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > >> > > >> attached base packages: > > >> [1] parallel stats graphics grDevices utils > datasets > > methods > > >> [8] base > > >> > > >> other attached packages: > > >> [1] doParallel_1.0.6 > > >> [2] doRNG_1.5.5 > > >> [3] rngtools_1.2.3 > > >> [4] pkgmaker_0.17.4 > > >> [5] registry_0.2 > > >> [6] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1 > > >> [7] minfi_1.9.11 > > >> [8] bumphunter_1.2.0 > > >> [9] locfit_1.5-9.1 > > >> [10] iterators_1.0.6 > > >> [11] foreach_1.4.1 > > >> [12] Biostrings_2.30.1 > > >> [13] GenomicRanges_1.14.4 > > >> [14] XVector_0.2.0 > > >> [15] IRanges_1.20.6 > > >> [16] lattice_0.20-24 > > >> [17] Biobase_2.22.0 > > >> [18] BiocGenerics_0.8.0 > > >> > > >> loaded via a namespace (and not attached): > > >> [1] AnnotationDbi_1.24.0 DBI_0.2-7 MASS_7.3-29 > > >> [4] R.methodsS3_1.6.1 RColorBrewer_1.0-5 RSQLite_0.11.4 > > >> [7] XML_3.98-1.1 annotate_1.40.0 beanplot_1.1 > > >> [10] codetools_0.2-8 digest_0.6.4 genefilter_1.44.0 > > >> [13] grid_3.0.2 illuminaio_0.2.0 itertools_0.1-1 > > >> [16] limma_3.18.4 matrixStats_0.8.14 mclust_4.2 > > >> [19] multtest_2.18.0 nlme_3.1-113 nor1mix_1.1-4 > > >> [22] preprocessCore_1.24.0 reshape_0.8.4 siggenes_1.36.0 > > >> [25] splines_3.0.2 stats4_3.0.2 stringr_0.6.2 > > >> [28] survival_2.37-7 tools_3.0.2 xtable_1.7-1 > > >> > > >> -- > > >> Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org=""> > > <http: bioconductor.org="">. > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org>> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > > > ____ > > This email message is a private communication. The information > > transmitted, including attachments, is intended only for the > > person or entity to which it is addressed and may contain > > confidential, privileged, and/or proprietary material. Any > review, > > duplication, retransmission, distribution, or other use of, or > > taking of any action in reliance upon, this information by > persons > > or entities other than the intended recipient is unauthorized by > > the sender and is prohibited. If you have received this > message in > > error, please contact the sender immediately by return email and > > delete the original message from all computer systems. Thank > you. > > > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > <mailto:bioconductor@r-project.org> <mailto:bioconductor@r-project.org>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > ____ > This email message is a private communication. The information > transmitted, including attachments, is intended only for the > person or entity to which it is addressed and may contain > confidential, privileged, and/or proprietary material. Any review, > duplication, retransmission, distribution, or other use of, or > taking of any action in reliance upon, this information by persons > or entities other than the intended recipient is unauthorized by > the sender and is prohibited. If you have received this message in > error, please contact the sender immediately by return email and > delete the original message from all computer systems. Thank you. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. [[alternative HTML version deleted]]

ADD REPLY • link 11.2 years ago Allegra Petti ▴ 120

Login before adding your answer.