Memory problem with rma()
3
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.4 years ago
Hi, I am running rma() to correct, normalize and summarize a batch of ca. 5500 arrays. I have currently a memory limit of 8gb and the procedures exceeds that. I am guessing that it breaks at the background correction step. I investigated the temporary directory and it's only file called tmp_310151_rbg.root that was modified (size of that file is 16gb). I attached the code below. I tried the latest ROOT version and the one recommended at bioconductor (root_v5.34.14,root_v5.34.05). Any idea why is there the memory issue? scheme.HuEx <- import.exon.scheme( filename = "Scheme_HuEx-1_0v2r2_hg19", layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", probeset = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.probeset.csv", transcript = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") data.HuEx <- import.data( scheme.HuEx, filename = "fhsCEL", filedir = "normalizationXPS/", celdir = "expression_CEL_raw/" ) data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", filedir="normalizationXPS", tmpdir = "normalizationXPS/tmpDir", add.data=FALSE, background="antigenomic", normalize=TRUE, option="transcript", exonlevel="core") -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=C LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] xps_1.22.2 loaded via a namespace (and not attached): [1] tools_3.0.2 -- Sent via the guest posting facility at bioconductor.org.
• 1.5k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States
And what was the actual error that you got? Sean On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < guest@bioconductor.org> wrote: > > Hi, > > I am running rma() to correct, normalize and summarize a batch of ca. 5500 > arrays. I have currently a memory limit of 8gb and the procedures exceeds > that. I am guessing that it breaks at the background correction step. I > investigated the temporary directory and it's only file called > tmp_310151_rbg.root that was modified (size of that file is 16gb). I > attached the code below. > > I tried the latest ROOT version and the one recommended at bioconductor > (root_v5.34.14,root_v5.34.05). > > Any idea why is there the memory issue? > > scheme.HuEx <- import.exon.scheme( > filename = "Scheme_HuEx-1_0v2r2_hg19", > layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > probeset = > "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > transcript = > "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") > > scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > > data.HuEx <- import.data( > scheme.HuEx, > filename = "fhsCEL", > filedir = "normalizationXPS/", > celdir = "expression_CEL_raw/" > ) > > data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > > rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > filedir="normalizationXPS", > tmpdir = "normalizationXPS/tmpDir", > add.data=FALSE, background="antigenomic", normalize=TRUE, > option="transcript", exonlevel="core") > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=C LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.22.2 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
I don't get a proper error message because I'm running the R session in an interactive shell on a cluster (queuing system). When the memory limit of 8gb is reached, my interactive shell is terminated by the queuing system. > And what was the actual error that you got? > > Sean > > > > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > guest at bioconductor.org> wrote: > >> >> Hi, >> >> I am running rma() to correct, normalize and summarize a batch of ca. >> 5500 >> arrays. I have currently a memory limit of 8gb and the procedures >> exceeds >> that. I am guessing that it breaks at the background correction step. I >> investigated the temporary directory and it's only file called >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I >> attached the code below. >> >> I tried the latest ROOT version and the one recommended at bioconductor >> (root_v5.34.14,root_v5.34.05). >> >> Any idea why is there the memory issue? >> >> scheme.HuEx <- import.exon.scheme( >> filename = "Scheme_HuEx-1_0v2r2_hg19", >> layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", >> schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", >> probeset = >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >> transcript = >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >> >> data.HuEx <- import.data( >> scheme.HuEx, >> filename = "fhsCEL", >> filedir = "normalizationXPS/", >> celdir = "expression_CEL_raw/" >> ) >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >> filedir="normalizationXPS", >> tmpdir = "normalizationXPS/tmpDir", >> add.data=FALSE, background="antigenomic", >> normalize=TRUE, >> option="transcript", exonlevel="core") >> >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=C LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] xps_1.22.2 >> >> loaded via a namespace (and not attached): >> [1] tools_3.0.2 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
Hi Damian, Soon, Christian should reply to you. In the meantime, for my personal interest and to define plans for the oligo package, would you be willing to try processing your set with oligo? library(ff) library(oligo) cels = list.celfiles() raw = read.celfiles(cels) res = rma(raw) If you have multiple cores available, before loading oligo, load a parallel front-end: library(doMC) registerDoMC(4) Let me know how it goes, if you have some time to spare... Thanks a million, benilton On Feb 16, 2014 7:15 PM, <plichta@cbs.dtu.dk> wrote: > I don't get a proper error message because I'm running the R session in an > interactive shell on a cluster (queuing system). When the memory limit of > 8gb is reached, my interactive shell is terminated by the queuing system. > > > And what was the actual error that you got? > > > > Sean > > > > > > > > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > > guest@bioconductor.org> wrote: > > > >> > >> Hi, > >> > >> I am running rma() to correct, normalize and summarize a batch of ca. > >> 5500 > >> arrays. I have currently a memory limit of 8gb and the procedures > >> exceeds > >> that. I am guessing that it breaks at the background correction step. I > >> investigated the temporary directory and it's only file called > >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I > >> attached the code below. > >> > >> I tried the latest ROOT version and the one recommended at bioconductor > >> (root_v5.34.14,root_v5.34.05). > >> > >> Any idea why is there the memory issue? > >> > >> scheme.HuEx <- import.exon.scheme( > >> filename = "Scheme_HuEx-1_0v2r2_hg19", > >> layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > >> schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > >> probeset = > >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > >> transcript = > >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") > >> > >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > >> > >> data.HuEx <- import.data( > >> scheme.HuEx, > >> filename = "fhsCEL", > >> filedir = "normalizationXPS/", > >> celdir = "expression_CEL_raw/" > >> ) > >> > >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > >> > >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > >> filedir="normalizationXPS", > >> tmpdir = "normalizationXPS/tmpDir", > >> add.data=FALSE, background="antigenomic", > >> normalize=TRUE, > >> option="transcript", exonlevel="core") > >> > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=C LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] xps_1.22.2 > >> > >> loaded via a namespace (and not attached): > >> [1] tools_3.0.2 > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Benilton, I tried oligo and it choked: >? >raw <- read.celfiles(cels) Loading required package: pd.huex.1.0.st.v2 Loading required package: RSQLite Loading required package: DBI Platform design info loaded. Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 1 and .Machine$integer.max") : missing value where TRUE/FALSE needed In addition: Warning message: In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), : NAs introduced by coercion Do you know what does this error indicate? Thanks, Damian > Hi Damian, > > Soon, Christian should reply to you. > > In the meantime, for my personal interest and to define plans for the > oligo > package, would you be willing to try processing your set with oligo? > > library(ff) > library(oligo) > cels = list.celfiles() > raw = read.celfiles(cels) > res = rma(raw) > > If you have multiple cores available, before loading oligo, load a > parallel > front-end: > > library(doMC) > registerDoMC(4) > > Let me know how it goes, if you have some time to spare... > > Thanks a million, benilton > On Feb 16, 2014 7:15 PM, <plichta at="" cbs.dtu.dk=""> wrote: > >> I don't get a proper error message because I'm running the R session in >> an >> interactive shell on a cluster (queuing system). When the memory limit >> of >> 8gb is reached, my interactive shell is terminated by the queuing >> system. >> >> > And what was the actual error that you got? >> > >> > Sean >> > >> > >> > >> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < >> > guest at bioconductor.org> wrote: >> > >> >> >> >> Hi, >> >> >> >> I am running rma() to correct, normalize and summarize a batch of ca. >> >> 5500 >> >> arrays. I have currently a memory limit of 8gb and the procedures >> >> exceeds >> >> that. I am guessing that it breaks at the background correction step. >> I >> >> investigated the temporary directory and it's only file called >> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I >> >> attached the code below. >> >> >> >> I tried the latest ROOT version and the one recommended at >> bioconductor >> >> (root_v5.34.14,root_v5.34.05). >> >> >> >> Any idea why is there the memory issue? >> >> >> >> scheme.HuEx <- import.exon.scheme( >> >> filename = "Scheme_HuEx-1_0v2r2_hg19", >> >> layoutfile = >> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", >> >> schemefile = >> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", >> >> probeset = >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >> >> transcript = >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.transcript.csv") >> >> >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >> >> >> >> data.HuEx <- import.data( >> >> scheme.HuEx, >> >> filename = "fhsCEL", >> >> filedir = "normalizationXPS/", >> >> celdir = "expression_CEL_raw/" >> >> ) >> >> >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >> >> >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >> >> filedir="normalizationXPS", >> >> tmpdir = "normalizationXPS/tmpDir", >> >> add.data=FALSE, background="antigenomic", >> >> normalize=TRUE, >> >> option="transcript", exonlevel="core") >> >> >> >> >> >> -- output of sessionInfo(): >> >> >> >> R version 3.0.2 (2013-09-25) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=C LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> >> other attached packages: >> >> [1] xps_1.22.2 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] tools_3.0.2 >> >> >> >> -- >> >> Sent via the guest posting facility at bioconductor.org. >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY
0
Entering edit mode
Thanks, Damian, that's the indication that 'ff' hit the maximum limit in object dimensions... :-( Thanks for letting me know, b 2014-02-17 0:22 GMT-03:00 <plichta@cbs.dtu.dk>: > Hi Benilton, > > I tried oligo and it choked: > > >... > >raw <- read.celfiles(cels) > > Loading required package: pd.huex.1.0.st.v2 > Loading required package: RSQLite > Loading required package: DBI > Platform design info loaded. > Error in if (length < 0 || length > .Machine$integer.max) stop("length > must be between 1 and .Machine$integer.max") : > missing value where TRUE/FALSE needed > In addition: Warning message: > In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = > file.path(ldPath(), : > NAs introduced by coercion > > Do you know what does this error indicate? > > Thanks, > > Damian > > > Hi Damian, > > > > Soon, Christian should reply to you. > > > > In the meantime, for my personal interest and to define plans for the > > oligo > > package, would you be willing to try processing your set with oligo? > > > > library(ff) > > library(oligo) > > cels = list.celfiles() > > raw = read.celfiles(cels) > > res = rma(raw) > > > > If you have multiple cores available, before loading oligo, load a > > parallel > > front-end: > > > > library(doMC) > > registerDoMC(4) > > > > Let me know how it goes, if you have some time to spare... > > > > Thanks a million, benilton > > On Feb 16, 2014 7:15 PM, <plichta@cbs.dtu.dk> wrote: > > > >> I don't get a proper error message because I'm running the R session in > >> an > >> interactive shell on a cluster (queuing system). When the memory limit > >> of > >> 8gb is reached, my interactive shell is terminated by the queuing > >> system. > >> > >> > And what was the actual error that you got? > >> > > >> > Sean > >> > > >> > > >> > > >> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < > >> > guest@bioconductor.org> wrote: > >> > > >> >> > >> >> Hi, > >> >> > >> >> I am running rma() to correct, normalize and summarize a batch of ca. > >> >> 5500 > >> >> arrays. I have currently a memory limit of 8gb and the procedures > >> >> exceeds > >> >> that. I am guessing that it breaks at the background correction step. > >> I > >> >> investigated the temporary directory and it's only file called > >> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). I > >> >> attached the code below. > >> >> > >> >> I tried the latest ROOT version and the one recommended at > >> bioconductor > >> >> (root_v5.34.14,root_v5.34.05). > >> >> > >> >> Any idea why is there the memory issue? > >> >> > >> >> scheme.HuEx <- import.exon.scheme( > >> >> filename = "Scheme_HuEx-1_0v2r2_hg19", > >> >> layoutfile = > >> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", > >> >> schemefile = > >> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", > >> >> probeset = > >> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", > >> >> transcript = > >> >> "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") > >> >> > >> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > >> >> > >> >> data.HuEx <- import.data( > >> >> scheme.HuEx, > >> >> filename = "fhsCEL", > >> >> filedir = "normalizationXPS/", > >> >> celdir = "expression_CEL_raw/" > >> >> ) > >> >> > >> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > >> >> > >> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > >> >> filedir="normalizationXPS", > >> >> tmpdir = "normalizationXPS/tmpDir", > >> >> add.data=FALSE, background="antigenomic", > >> >> normalize=TRUE, > >> >> option="transcript", exonlevel="core") > >> >> > >> >> > >> >> -- output of sessionInfo(): > >> >> > >> >> R version 3.0.2 (2013-09-25) > >> >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> >> > >> >> locale: > >> >> [1] LC_CTYPE=C LC_NUMERIC=C > >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> >> > >> >> attached base packages: > >> >> [1] stats graphics grDevices utils datasets methods base > >> >> > >> >> other attached packages: > >> >> [1] xps_1.22.2 > >> >> > >> >> loaded via a namespace (and not attached): > >> >> [1] tools_3.0.2 > >> >> > >> >> -- > >> >> Sent via the guest posting facility at bioconductor.org. > >> >> > >> >> _______________________________________________ > >> >> Bioconductor mailing list > >> >> Bioconductor@r-project.org > >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >> Search the archives: > >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > >> > > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 6.3 years ago
Austria
Dear Damian, In principle you should not have a memory problem, however 5500 exon arrays is quite a lot, thus let me propose the following: 1. Do not run function rma() directly, but do it stepwise, i.e.: data.bg.rma <- bgcorrect.rma(data.exon, ...) data.qu.rma <- normalize.quantiles(data.bg.rma, ...) data.mp.rma <- summarize.rma(data.qu.rma, ...) You can find an example in script examples/script4exon.R (at line 750). In this way you will not loose all your computation if anything goes wrong at one step. Maybe you will also need to to set 'add.data=FALSE' in summarize.rma() otherwise all expression data will be imported causing a memory problem, too. Another way to run rma() stepwise is to use function express(), see example in script examples/script4exon.R (at line 785). When using function express you could set parameter 'bufsize=4000', which will reduce the basket size for each tree, thus consuming less RAM. 2. I would suggest to use first only 6 exon arrays to see if everything works fine, then I would try to run 50 exon arrays to see if - there is an initial memory problem - to estimate how long each step needs if you run all 5500 arrays (approximately time x 110) 3. Please run everything with 'verbose=TRUE' so that you can see the output interactively. Maybe you could pipe the output to a text file. 4. Since you assume that there may be a memory problem: maybe you can run top (or something else) and check RSIZE/VSIZE from time to time. Maybe you can create a script which export the memory consumption e.g. every 10 min. 4. I am not sure if running the code on a cluster is a good idea. Do you run your code on a node which is exclusively used for this purpose only? My suggestion would be to run your code on a machine where nothing else is running, since I assume that for 5500 exon arrays you will need at least one week (but see point 2). (Note: In 2009 a customer was running 23000 HGU-133_Plus2 arrays on a machine and with his help I could eliminate (hopefully) all memory problems, some of which appeared after 2000 arrays only. In his case memory consumption initially increased to 7.8 GB but after solving the memory problems memory consumption remained at 3.0 GB.) Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 2/16/14 8:07 PM, Damian Plichta [guest] wrote: > Hi, > > I am running rma() to correct, normalize and summarize a batch of ca. 5500 arrays. I have currently a memory limit of 8gb and the procedures exceeds that. I am guessing that it breaks at the background correction step. I investigated the temporary directory and it's only file called tmp_310151_rbg.root that was modified (size of that file is 16gb). I attached the code below. > > I tried the latest ROOT version and the one recommended at bioconductor (root_v5.34.14,root_v5.34.05). > > Any idea why is there the memory issue? > > scheme.HuEx <- import.exon.scheme( > filename = "Scheme_HuEx-1_0v2r2_hg19", > layoutfile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.clf", > schemefile = "affyHuExome_design/HuEx- 1_0-st-v2.r2.pgf", > probeset = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.probeset.csv", > transcript = "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") > > scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") > > data.HuEx <- import.data( > scheme.HuEx, > filename = "fhsCEL", > filedir = "normalizationXPS/", > celdir = "expression_CEL_raw/" > ) > > data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") > > rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", > filedir="normalizationXPS", > tmpdir = "normalizationXPS/tmpDir", > add.data=FALSE, background="antigenomic", normalize=TRUE, > option="transcript", exonlevel="core") > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=C LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] xps_1.22.2 > > loaded via a namespace (and not attached): > [1] tools_3.0.2 > > -- > Sent via the guest posting facility at bioconductor.org. >
ADD COMMENT
0
Entering edit mode
@stephen-piccolo-6761
Last seen 4.3 years ago
United States
Hi Damian, I receive the digest version of the BioC mailing list, so I apologize if someone already gave this reply, but various Bioconductor packages are designed for processing very large Affy data sets. Our own SCAN.UPC package as well as the fRMA package normalize one sample at a time and thus can be applied to data sets of any size. Another option would be the aroma.affymetrix package, which is designed for doing memory-efficient RMA normalization. Hope that helps! If you end up trying SCAN.UPC, you might also try the option for processing multiple samples in parallel, which you should be able to do on a computer cluster. Regards, -Steve On 2/17/14, 4:00 AM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: >Date: Mon, 17 Feb 2014 00:39:48 -0300 >From: Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >To: plichta at cbs.dtu.dk >Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, Sean > Davis <sdavis2 at="" mail.nih.gov=""> >Subject: Re: [BioC] Memory problem with rma() >Message-ID: > <cao-arwmyx1ynxv8osnqa96=2pehxvvfdmojam56brj-wez-c_a at="" mail.gmail.com=""> >Content-Type: text/plain > >Thanks, Damian, > >that's the indication that 'ff' hit the maximum limit in object >dimensions... :-( > >Thanks for letting me know, > >b > > >2014-02-17 0:22 GMT-03:00 <plichta at="" cbs.dtu.dk="">: > >>Hi Benilton, >> >>I tried oligo and it choked: >> >>>... >>>raw <- read.celfiles(cels) >> >>Loading required package: pd.huex.1.0.st.v2 >>Loading required package: RSQLite >>Loading required package: DBI >>Platform design info loaded. >>Error in if (length < 0 || length > .Machine$integer.max) stop("length >>must be between 1 and .Machine$integer.max") : >> missing value where TRUE/FALSE needed >>In addition: Warning message: >>In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >>file.path(ldPath(), : >> NAs introduced by coercion >> >>Do you know what does this error indicate? >> >>Thanks, >> >>Damian >> >>> Hi Damian, >>> >>> Soon, Christian should reply to you. >>> >>> In the meantime, for my personal interest and to define plans for the >>> oligo >>> package, would you be willing to try processing your set with oligo? >>> >>> library(ff) >>> library(oligo) >>> cels = list.celfiles() >>> raw = read.celfiles(cels) >>> res = rma(raw) >>> >>> If you have multiple cores available, before loading oligo, load a >>> parallel >>> front-end: >>> >>> library(doMC) >>> registerDoMC(4) >>> >>> Let me know how it goes, if you have some time to spare... >>> >>> Thanks a million, benilton >>> On Feb 16, 2014 7:15 PM, <plichta at="" cbs.dtu.dk=""> wrote: >>> >>>> I don't get a proper error message because I'm running the R session >>>>in >>>> an >>>> interactive shell on a cluster (queuing system). When the memory limit >>>> of >>>> 8gb is reached, my interactive shell is terminated by the queuing >>>> system. >>>> >>>> > And what was the actual error that you got? >>>> > >>>> > Sean >>>> > >>>> > >>>> > >>>> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] < >>>> > guest at bioconductor.org> wrote: >>>> > >>>> >> >>>> >> Hi, >>>> >> >>>> >> I am running rma() to correct, normalize and summarize a batch of >>>>ca. >>>> >> 5500 >>>> >> arrays. I have currently a memory limit of 8gb and the procedures >>>> >> exceeds >>>> >> that. I am guessing that it breaks at the background correction >>>>step. >>>> I >>>> >> investigated the temporary directory and it's only file called >>>> >> tmp_310151_rbg.root that was modified (size of that file is 16gb). >>>> I >>>> >> attached the code below. >>>> >> >>>> >> I tried the latest ROOT version and the one recommended at >>>> bioconductor >>>> >> (root_v5.34.14,root_v5.34.05). >>>> >> >>>> >> Any idea why is there the memory issue? >>>> >> >>>> >> scheme.HuEx <- import.exon.scheme( >>>> >> filename = "Scheme_HuEx-1_0v2r2_hg19", >>>> >> layoutfile = >>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf", >>>> >> schemefile = >>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf", >>>> >> probeset = >>>> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv", >>>> >> transcript = >>>> >> "affyHuExome_design/HuEx- 1_0-st-v2.na33.1.hg19.transcript.csv") >>>> >> >>>> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root") >>>> >> >>>> >> data.HuEx <- import.data( >>>> >> scheme.HuEx, >>>> >> filename = "fhsCEL", >>>> >> filedir = "normalizationXPS/", >>>> >> celdir = "expression_CEL_raw/" >>>> >> ) >>>> >> >>>> >> data.HuEx <- root.data(scheme.HuEx, rootfile="fhsCEL_cel.root") >>>> >> >>>> >> rma.HuEx.transcript <- rma(data.HuEx, filename="HuEx_RMAquantile", >>>> >> filedir="normalizationXPS", >>>> >> tmpdir = "normalizationXPS/tmpDir", >>>> >> add.data=FALSE, background="antigenomic", >>>> >> normalize=TRUE, >>>> >> option="transcript", exonlevel="core") >>>> >> >>>> >> >>>> >> -- output of sessionInfo(): >>>> >> >>>> >> R version 3.0.2 (2013-09-25) >>>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >> >>>> >> locale: >>>> >> [1] LC_CTYPE=C LC_NUMERIC=C >>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >> >>>> >> attached base packages: >>>> >> [1] stats graphics grDevices utils datasets methods >>>>base >>>> >> >>>> >> other attached packages: >>>> >> [1] xps_1.22.2 >>>> >> >>>> >> loaded via a namespace (and not attached): >>>> >> [1] tools_3.0.2 >>>> >> >>>> >> -- >>>> >> Sent via the guest posting facility at bioconductor.org. >>>> >> >>>> >> _______________________________________________ >>>> >> Bioconductor mailing list >>>> >> Bioconductor at r-project.org >>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >> Search the archives: >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> >>>> > >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >> >> >
ADD COMMENT

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6