Hi Damian,
I receive the digest version of the BioC mailing list, so I apologize
if
someone already gave this reply, but various Bioconductor packages are
designed for processing very large Affy data sets. Our own SCAN.UPC
package as well as the fRMA package normalize one sample at a time and
thus can be applied to data sets of any size. Another option would be
the
aroma.affymetrix package, which is designed for doing memory-efficient
RMA
normalization.
Hope that helps! If you end up trying SCAN.UPC, you might also try the
option for processing multiple samples in parallel, which you should
be
able to do on a computer cluster.
Regards,
-Steve
On 2/17/14, 4:00 AM, "bioconductor-request at r-project.org"
<bioconductor-request at="" r-project.org=""> wrote:
>Date: Mon, 17 Feb 2014 00:39:48 -0300
>From: Benilton Carvalho <beniltoncarvalho at="" gmail.com="">
>To: plichta at cbs.dtu.dk
>Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">,
Sean
> Davis <sdavis2 at="" mail.nih.gov="">
>Subject: Re: [BioC] Memory problem with rma()
>Message-ID:
> <cao-arwmyx1ynxv8osnqa96=2pehxvvfdmojam56brj-wez-c_a at="" mail.gmail.com="">
>Content-Type: text/plain
>
>Thanks, Damian,
>
>that's the indication that 'ff' hit the maximum limit in object
>dimensions... :-(
>
>Thanks for letting me know,
>
>b
>
>
>2014-02-17 0:22 GMT-03:00 <plichta at="" cbs.dtu.dk="">:
>
>>Hi Benilton,
>>
>>I tried oligo and it choked:
>>
>>>...
>>>raw <- read.celfiles(cels)
>>
>>Loading required package: pd.huex.1.0.st.v2
>>Loading required package: RSQLite
>>Loading required package: DBI
>>Platform design info loaded.
>>Error in if (length < 0 || length > .Machine$integer.max)
stop("length
>>must be between 1 and .Machine$integer.max") :
>> missing value where TRUE/FALSE needed
>>In addition: Warning message:
>>In ff(initdata = initdata, vmode = vmode, dim = dim, pattern =
>>file.path(ldPath(), :
>> NAs introduced by coercion
>>
>>Do you know what does this error indicate?
>>
>>Thanks,
>>
>>Damian
>>
>>> Hi Damian,
>>>
>>> Soon, Christian should reply to you.
>>>
>>> In the meantime, for my personal interest and to define plans for
the
>>> oligo
>>> package, would you be willing to try processing your set with
oligo?
>>>
>>> library(ff)
>>> library(oligo)
>>> cels = list.celfiles()
>>> raw = read.celfiles(cels)
>>> res = rma(raw)
>>>
>>> If you have multiple cores available, before loading oligo, load a
>>> parallel
>>> front-end:
>>>
>>> library(doMC)
>>> registerDoMC(4)
>>>
>>> Let me know how it goes, if you have some time to spare...
>>>
>>> Thanks a million, benilton
>>> On Feb 16, 2014 7:15 PM, <plichta at="" cbs.dtu.dk=""> wrote:
>>>
>>>> I don't get a proper error message because I'm running the R
session
>>>>in
>>>> an
>>>> interactive shell on a cluster (queuing system). When the memory
limit
>>>> of
>>>> 8gb is reached, my interactive shell is terminated by the queuing
>>>> system.
>>>>
>>>> > And what was the actual error that you got?
>>>> >
>>>> > Sean
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Feb 16, 2014 at 2:07 PM, Damian Plichta [guest] <
>>>> > guest at bioconductor.org> wrote:
>>>> >
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I am running rma() to correct, normalize and summarize a batch
of
>>>>ca.
>>>> >> 5500
>>>> >> arrays. I have currently a memory limit of 8gb and the
procedures
>>>> >> exceeds
>>>> >> that. I am guessing that it breaks at the background
correction
>>>>step.
>>>> I
>>>> >> investigated the temporary directory and it's only file called
>>>> >> tmp_310151_rbg.root that was modified (size of that file is
16gb).
>>>> I
>>>> >> attached the code below.
>>>> >>
>>>> >> I tried the latest ROOT version and the one recommended at
>>>> bioconductor
>>>> >> (root_v5.34.14,root_v5.34.05).
>>>> >>
>>>> >> Any idea why is there the memory issue?
>>>> >>
>>>> >> scheme.HuEx <- import.exon.scheme(
>>>> >> filename = "Scheme_HuEx-1_0v2r2_hg19",
>>>> >> layoutfile =
>>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.clf",
>>>> >> schemefile =
>>>> "affyHuExome_design/HuEx-1_0-st-v2.r2.pgf",
>>>> >> probeset =
>>>> >> "affyHuExome_design/HuEx-1_0-st-v2.na33.1.hg19.probeset.csv",
>>>> >> transcript =
>>>> >> "affyHuExome_design/HuEx-
1_0-st-v2.na33.1.hg19.transcript.csv")
>>>> >>
>>>> >> scheme.HuEx <- root.scheme("Scheme_HuEx-1_0v2r2_hg19.root")
>>>> >>
>>>> >> data.HuEx <- import.data(
>>>> >> scheme.HuEx,
>>>> >> filename = "fhsCEL",
>>>> >> filedir = "normalizationXPS/",
>>>> >> celdir = "expression_CEL_raw/"
>>>> >> )
>>>> >>
>>>> >> data.HuEx <- root.data(scheme.HuEx,
rootfile="fhsCEL_cel.root")
>>>> >>
>>>> >> rma.HuEx.transcript <- rma(data.HuEx,
filename="HuEx_RMAquantile",
>>>> >> filedir="normalizationXPS",
>>>> >> tmpdir = "normalizationXPS/tmpDir",
>>>> >> add.data=FALSE, background="antigenomic",
>>>> >> normalize=TRUE,
>>>> >> option="transcript", exonlevel="core")
>>>> >>
>>>> >>
>>>> >> -- output of sessionInfo():
>>>> >>
>>>> >> R version 3.0.2 (2013-09-25)
>>>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> >>
>>>> >> locale:
>>>> >> [1] LC_CTYPE=C LC_NUMERIC=C
>>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> >>
>>>> >> attached base packages:
>>>> >> [1] stats graphics grDevices utils datasets methods
>>>>base
>>>> >>
>>>> >> other attached packages:
>>>> >> [1] xps_1.22.2
>>>> >>
>>>> >> loaded via a namespace (and not attached):
>>>> >> [1] tools_3.0.2
>>>> >>
>>>> >> --
>>>> >> Sent via the guest posting facility at bioconductor.org.
>>>> >>
>>>> >> _______________________________________________
>>>> >> Bioconductor mailing list
>>>> >> Bioconductor at r-project.org
>>>> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> >> Search the archives:
>>>> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> >>
>>>> >
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>>
>>
>