bout big data set for Affy R packge
4
0
Entering edit mode
刘伟 ▴ 30
@-5667
Last seen 10.3 years ago
Dear Buddy, I am a user of affy R package. When I attempt to handle a large number (aprox. 300) of microarrays, I always get an error in memory allocation from R. I searched the web but didnot find any solution for readaffy() with large dataset. I donnot know if the problem can be fixed in some way. Any suggestion is appreciated. Thanks. Sincerely, Wei Liu [[alternative HTML version deleted]]
affy affy • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Wei Liu, You can try justRMA(). If that doesn't work, you can try the aroma.affymetrix package. Note that the aroma.affymetrix package is not part of BioC, and has its own user group and repository, so you need to do a google search for that one. Best, Jim On 12/19/2012 9:21 AM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 6.2 years ago
Austria
Dear Wei Liu, You could use the BioConductor package xps which can handle a couple of thousand microarrays on computers with 1-2 GB RAM only. See also: http://www.bioconductor.org/help/workflows/oligo-arrays/#pre- processing-resources which packages might be relevant. Regards Christian On 12/19/12 3:21 PM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Rob Dunne ▴ 230
@rob-dunne-292
Last seen 10.3 years ago
Hi Wei Liu, if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) my. data #assayData: 6553600 features, 335 samples #Annotation: pd.huex.1.0.st.v2 Bye Rob On 12/20/2012 01:21 AM, ?? wrote: > Dear Buddy, > I am a user of affy R package. When I attempt to handle a large > number (aprox. 300) of microarrays, I always get an error in memory > allocation from R. I searched the web but didnot find any solution for > readaffy() with large dataset. I donnot know if the problem can be > fixed in some way. Any suggestion is appreciated. Thanks. > > Sincerely, > Wei Liu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- - Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 Locked Bag 17, North Ryde, New South Wales, Australia, 1670 http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au Java has certainly revolutionized marketing and litigation.
ADD COMMENT
0
Entering edit mode
Hi Rob, looks like you're running an old version of oligo. Today, our approach is: library(ff) library(oligo) my.data <- read.celfiles(<cel file="" names="">) HTH, b On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: > Hi Wei Liu, > > if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that > should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. > It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package > > my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) > my. data > #assayData: 6553600 features, 335 samples > #Annotation: pd.huex.1.0.st.v2 > > Bye > Rob > > > > > On 12/20/2012 01:21 AM, ?? wrote: >> Dear Buddy, >> I am a user of affy R package. When I attempt to handle a large >> number (aprox. 300) of microarrays, I always get an error in memory >> allocation from R. I searched the web but didnot find any solution for >> readaffy() with large dataset. I donnot know if the problem can be >> fixed in some way. Any suggestion is appreciated. Thanks. >> >> Sincerely, >> Wei Liu >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > - > Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 > CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 > Locked Bag 17, North Ryde, New South Wales, Australia, 1670 > http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au > > Java has certainly revolutionized marketing and litigation. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Benilton, Unless I am missing something, ff wont help in this case. From the ff help page "Currently ff objects cannot have length zero and are limited to ?.Machine$integer.max? elements" and .Machine$integer.max is 2^(31)-1. This is exceeded when you try to load 328 Affy exon arrays hence library(ff) library(oligo) data<-read.celfiles(filenames=files) #Loading required package: pd.huex.1.0.st.v2 #Loading required package: RSQLite #Loading required package: DBI #Platform design info loaded. #Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 1 and .Machine$integer.max") : # missing value where TRUE/FALSE needed #In addition: Warning message: #In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), : # NAs introduced by coercion traceback() #4: ff(initdata = initdata, vmode = vmode, dim = dim, pattern = file.path(ldPath(), # basename(name))) #3: createFF("intensities-", dim = c(nr, length(filenames))) #2: smartReadCEL(filenames, sampleNames, headdetails = headdetails) #1: read.celfiles(filenames = ff) This is why I went done the path of modifying read.celfiles to use big.matrix, which does not have the 2^(31)-1 limit Bye Rob sessionInfo() #R version 2.15.0 (2012-03-30) #Platform: x86_64-unknown-linux-gnu (64-bit) # #locale: # [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C # [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 # [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 # [7] LC_PAPER=C LC_NAME=C # [9] LC_ADDRESS=C LC_TELEPHONE=C #[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C # #attached base packages: #[1] tools stats graphics grDevices utils datasets methods #[8] base # #other attached packages: #[1] pd.huex.1.0.st.v2_3.6.0 RSQLite_0.11.2 DBI_0.2-5 #[4] oligo_1.20.4 oligoClasses_1.18.0 ff_2.2-10 #[7] bit_1.1-9 # #loaded via a namespace (and not attached): # [1] affxparser_1.28.1 affyio_1.24.0 Biobase_2.16.0 # [4] BiocGenerics_0.2.0 BiocInstaller_1.4.9 Biostrings_2.24.1 # [7] codetools_0.2-8 compiler_2.15.0 foreach_1.4.0 #[10] IRanges_1.14.4 iterators_1.0.6 preprocessCore_1.18.0 #[13] splines_2.15.0 stats4_2.15.0 zlibbioc_1.2.0 On 12/21/2012 10:45 PM, Benilton Carvalho wrote: > Hi Rob, > > looks like you're running an old version of oligo. > > Today, our approach is: > > library(ff) > library(oligo) > my.data <- read.celfiles(<cel file="" names="">) > > HTH, > b > > On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: >> Hi Wei Liu, >> >> if they are affymetrix 1.0 ST exon arrays, I can send you a modified version of read.celfiles from the oligo package that >> should read a 300 microarray data set. I dont know it it will work for other array types, possibly not without some work. >> It is a modified version of the read.celfiles that uses the big.matrix class from the big.memory package >> >> my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) >> my. data >> #assayData: 6553600 features, 335 samples >> #Annotation: pd.huex.1.0.st.v2 >> >> Bye >> Rob >> >> >> >> >> On 12/20/2012 01:21 AM, ?? wrote: >>> Dear Buddy, >>> I am a user of affy R package. When I attempt to handle a large >>> number (aprox. 300) of microarrays, I always get an error in memory >>> allocation from R. I searched the web but didnot find any solution for >>> readaffy() with large dataset. I donnot know if the problem can be >>> fixed in some way. Any suggestion is appreciated. Thanks. >>> >>> Sincerely, >>> Wei Liu >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> - >> Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >> CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >> Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >> http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au >> >> Java has certainly revolutionized marketing and litigation. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- - Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 Locked Bag 17, North Ryde, New South Wales, Australia, 1670 http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au Java has certainly revolutionized marketing and litigation.
ADD REPLY
0
Entering edit mode
@stephen-piccolo-6761
Last seen 4.3 years ago
United States
Wei, I'm assuming your end goal is to normalize the files? If so, there are a few other options you could try for a large number of CEL files. You could process the CEL files in smaller groups. Alternatively (and in my opinion, a better approach), you could use our SCAN.UPC package (or the frma package), which are designed to normalize one file at a time. That way you only need enough memory to process one file at a time. Regards, -Steve On 12/22/2011 Sat, Dec 22, 2011 4:00 AM, "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: > > >------------------------------ > >Message: 10 >Date: Sat, 22 Dec 2012 15:31:51 +1100 >From: Rob Dunne <rob.dunne at="" csiro.au=""> >To: Benilton Carvalho <beniltoncarvalho at="" gmail.com=""> >Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >Subject: Re: [BioC] bout big data set for Affy R packge >Message-ID: <50D537B7.700 at csiro.au> >Content-Type: text/plain; charset="UTF-8"; format=flowed > >Hi Benilton, > >Unless I am missing something, ff wont help in this case. From the ff >help page > >"Currently ff objects cannot have length zero and are limited to >?.Machine$integer.max? elements" > >and .Machine$integer.max is 2^(31)-1. This is exceeded when you try to >load 328 Affy exon arrays hence > >library(ff) >library(oligo) >data<-read.celfiles(filenames=files) >#Loading required package: pd.huex.1.0.st.v2 >#Loading required package: RSQLite >#Loading required package: DBI >#Platform design info loaded. >#Error in if (length < 0 || length > .Machine$integer.max) stop("length >must be between 1 and .Machine$integer.max") : ># missing value where TRUE/FALSE needed >#In addition: Warning message: >#In ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >file.path(ldPath(), : ># NAs introduced by coercion > > traceback() >#4: ff(initdata = initdata, vmode = vmode, dim = dim, pattern = >file.path(ldPath(), ># basename(name))) >#3: createFF("intensities-", dim = c(nr, length(filenames))) >#2: smartReadCEL(filenames, sampleNames, headdetails = headdetails) >#1: read.celfiles(filenames = ff) > >This is why I went done the path of modifying read.celfiles to use >big.matrix, which does not have the 2^(31)-1 >limit > >Bye >Rob > > > > > > >sessionInfo() >#R version 2.15.0 (2012-03-30) >#Platform: x86_64-unknown-linux-gnu (64-bit) ># >#locale: ># [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C ># [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 ># [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 ># [7] LC_PAPER=C LC_NAME=C ># [9] LC_ADDRESS=C LC_TELEPHONE=C >#[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C ># >#attached base packages: >#[1] tools stats graphics grDevices utils datasets methods >#[8] base ># >#other attached packages: >#[1] pd.huex.1.0.st.v2_3.6.0 RSQLite_0.11.2 DBI_0.2-5 >#[4] oligo_1.20.4 oligoClasses_1.18.0 ff_2.2-10 >#[7] bit_1.1-9 ># >#loaded via a namespace (and not attached): ># [1] affxparser_1.28.1 affyio_1.24.0 Biobase_2.16.0 ># [4] BiocGenerics_0.2.0 BiocInstaller_1.4.9 Biostrings_2.24.1 ># [7] codetools_0.2-8 compiler_2.15.0 foreach_1.4.0 >#[10] IRanges_1.14.4 iterators_1.0.6 preprocessCore_1.18.0 >#[13] splines_2.15.0 stats4_2.15.0 zlibbioc_1.2.0 > > >On 12/21/2012 10:45 PM, Benilton Carvalho wrote: >> Hi Rob, >> >> looks like you're running an old version of oligo. >> >> Today, our approach is: >> >> library(ff) >> library(oligo) >> my.data <- read.celfiles(<cel file="" names="">) >> >> HTH, >> b >> >> On 21 December 2012 01:02, Rob Dunne <rob.dunne at="" csiro.au=""> wrote: >>> Hi Wei Liu, >>> >>> if they are affymetrix 1.0 ST exon arrays, I can send you a modified >>>version of read.celfiles from the oligo package that >>> should read a 300 microarray data set. I dont know it it will work for >>>other array types, possibly not without some work. >>> It is a modified version of the read.celfiles that uses the >>>big.matrix class from the big.memory package >>> >>> my.data<-read.celfiles(filenames=ff,useAffyio=FALSE) >>> my. data >>> #assayData: 6553600 features, 335 samples >>> #Annotation: pd.huex.1.0.st.v2 >>> >>> Bye >>> Rob >>> >>> >>> >>> >>> On 12/20/2012 01:21 AM, ?? wrote: >>>> Dear Buddy, >>>> I am a user of affy R package. When I attempt to handle a large >>>> number (aprox. 300) of microarrays, I always get an error in memory >>>> allocation from R. I searched the web but didnot find any solution for >>>> readaffy() with large dataset. I donnot know if the problem can be >>>> fixed in some way. Any suggestion is appreciated. Thanks. >>>> >>>> Sincerely, >>>> Wei Liu >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> -- >>> - >>> Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >>> CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >>> Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >>> http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au >>> >>> Java has certainly revolutionized marketing and litigation. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- >- >Rob Dunne Fax: +61 2 9325 3200 Tel: +61 2 9325 3263 >CSIRO Mathematics, Informatics and Statistics +61 2 9325 3100 >Locked Bag 17, North Ryde, New South Wales, Australia, 1670 >http://www.bioinformatics.csiro.au Email: Rob.Dunne at csiro.au > > Java has certainly revolutionized marketing and litigation. >
ADD COMMENT

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6