Reading 5000 celfiles with ReadAffy
2
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi, A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. Attempting to read in this many files results in an error in allocating the matrix which is as follows. allocMatrix: too many elements specified. As there is plenty of memory allocated to R this was surprising. Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix specified by C which leads to this error. I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? Thank you in advance. Sincerely, Saif Ur-Rehman -- output of sessionInfo(): NA -- Sent via the guest posting facility at bioconductor.org.
affy affy • 1.8k views
ADD COMMENT
0
Entering edit mode
Paul Geeleher ★ 1.3k
@paul-geeleher-2679
Last seen 10.3 years ago
Hi Saif, The R packages "XPS" (http://www.bioconductor.org/packages/release/bioc/html/xps.html) or "Aroma affy" (http://www.aroma-project.org/) should be able to normalize such a massive datasets without loading it into memory. Paul. On Mon, Feb 20, 2012 at 5:09 PM, Saif Ur-Rehman [guest] <guest at="" bioconductor.org=""> wrote: > > Hi, > > A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. > > > > Attempting to read in this many files results in an error in allocating the matrix which is as follows. > > allocMatrix: too many elements specified. > > As there is plenty of memory allocated to R this was surprising. > > Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix ?specified by C ?which leads to this error. > > I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? > > Thank you in advance. > > Sincerely, > Saif Ur-Rehman > > ?-- output of sessionInfo(): > > NA > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Paul Geeleher (PhD Student) School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD COMMENT
0
Entering edit mode
Saif, You might want to consider using fRMA: McCall MN, Bolstad BM, and Irizarry RA* (2010). Frozen Robust Multi-Array Analysis (fRMA), Biostatistics, 11(2):242-253. http://bioconductor.org/packages/release/bioc/html/frma.html Best, Matt On Mon, Feb 20, 2012 at 12:17 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Hi Saif, > > The R packages "XPS" > (http://www.bioconductor.org/packages/release/bioc/html/xps.html) or > "Aroma affy" (http://www.aroma-project.org/) should be able to > normalize such a massive datasets without loading it into memory. > > Paul. > > On Mon, Feb 20, 2012 at 5:09 PM, Saif Ur-Rehman [guest] > <guest at="" bioconductor.org=""> wrote: >> >> Hi, >> >> A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. >> >> >> >> Attempting to read in this many files results in an error in allocating the matrix which is as follows. >> >> allocMatrix: too many elements specified. >> >> As there is plenty of memory allocated to R this was surprising. >> >> Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix ?specified by C ?which leads to this error. >> >> I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? >> >> Thank you in advance. >> >> Sincerely, >> Saif Ur-Rehman >> >> ?-- output of sessionInfo(): >> >> NA >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Paul Geeleher (PhD Student) > School of Mathematics, Statistics and Applied Mathematics > National University of Ireland > Galway > Ireland > -- > www.bioinformaticstutorials.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Matthew N McCall, PhD 112 Arvine Heights Rochester, NY 14611 Cell: 202-222-5880
ADD REPLY
0
Entering edit mode
Ying Chen ▴ 110
@ying-chen-4756
Last seen 10.3 years ago
Hi, You can try aroma.affymetrix, which is not a Bioconductor package yet. Or you can try the stand alone application RMAexpress as someone said he did a RMA on more than 10,000 cels with it. Ying -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Saif Ur-Rehman [guest] Sent: Monday, February 20, 2012 12:09 PM To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk Subject: [BioC] Reading 5000 celfiles with ReadAffy Hi, A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. Attempting to read in this many files results in an error in allocating the matrix which is as follows. allocMatrix: too many elements specified. As there is plenty of memory allocated to R this was surprising. Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix specified by C which leads to this error. I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? Thank you in advance. Sincerely, Saif Ur-Rehman -- output of sessionInfo(): NA -- Sent via the guest posting facility at bioconductor.org. _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}}
ADD COMMENT
0
Entering edit mode
I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such CEL files on regular machine (say >2GB of RAM). The bet is open to the first 3 persons who challenge me (email me). I would be happy to raise the number to 100,000 CEL files, but that'll be hard to find ;) /Henrik (author of aroma.affymetrix) On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen at="" imclone.com=""> wrote: > Hi, > > You can try aroma.affymetrix, which is not a Bioconductor package yet. Or you can try the stand alone application RMAexpress as someone said he did a RMA on more than 10,000 cels with it. > > Ying > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Saif Ur-Rehman [guest] > Sent: Monday, February 20, 2012 12:09 PM > To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk > Subject: [BioC] Reading 5000 celfiles with ReadAffy > > > Hi, > > A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. > > > > Attempting to read in this many files results in an error in allocating the matrix which is as follows. > > allocMatrix: too many elements specified. > > As there is plenty of memory allocated to R this was surprising. > > Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix ?specified by C ?which leads to this error. > > I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? > > Thank you in advance. > > Sincerely, > Saif Ur-Rehman > > ?-- output of sessionInfo(): > > NA > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
great!!!! 100.000 not on a regular machine... 2012/2/22 Henrik Bengtsson <hb@biostat.ucsf.edu> > I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such > CEL files on regular machine (say >2GB of RAM). The bet is open to > the first 3 persons who challenge me (email me). I would be happy to > raise the number to 100,000 CEL files, but that'll be hard to find ;) > > /Henrik > (author of aroma.affymetrix) > > On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen@imclone.com> wrote: > > Hi, > > > > You can try aroma.affymetrix, which is not a Bioconductor package yet. > Or you can try the stand alone application RMAexpress as someone said he > did a RMA on more than 10,000 cels with it. > > > > Ying > > > > -----Original Message----- > > From: bioconductor-bounces@r-project.org [mailto: > bioconductor-bounces@r-project.org] On Behalf Of Saif Ur-Rehman [guest] > > Sent: Monday, February 20, 2012 12:09 PM > > To: bioconductor@r-project.org; saif.urrehman@icr.ac.uk > > Subject: [BioC] Reading 5000 celfiles with ReadAffy > > > > > > Hi, > > > > A student in my institute is trying to normalise >5000 celfiles > generated on the U133A platform using the affy BioConductor library. > > > > > > > > Attempting to read in this many files results in an error in allocating > the matrix which is as follows. > > > > allocMatrix: too many elements specified. > > > > As there is plenty of memory allocated to R this was surprising. > > > > Some Googling showed that there is a hard limit of +2,147,483,647 in the > no. of columns in a matrix specified by C which leads to this error. > > > > I was just writing to ask if anyone had experience with normalisation of > a large no. of celfiles and had encountered this problem and if so what if > any solution you found? > > > > Thank you in advance. > > > > Sincerely, > > Saif Ur-Rehman > > > > -- output of sessionInfo(): > > > > NA > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- ----------------------------------------------------- Dr. Alberto Goldoni Parma, Italy ----------------------------------------------------- [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Henrik, The regular machine does not mean windows machine, right? When I run aroma on windows 7 64-bit machines, the problem is that the R GUI window always freezes (Not Responding) when I tried to process more than 2000 cels. I tried once with 1400 cels with aroma and it finished successfully :) Thanks, Ying > From: hb@biostat.ucsf.edu > Date: Tue, 21 Feb 2012 15:48:04 -0800 > To: Ying.Chen@imclone.com > CC: saif.urrehman@icr.ac.uk; bioconductor@r-project.org > Subject: Re: [BioC] Reading 5000 celfiles with ReadAffy > > I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such > CEL files on regular machine (say >2GB of RAM). The bet is open to > the first 3 persons who challenge me (email me). I would be happy to > raise the number to 100,000 CEL files, but that'll be hard to find ;) > > /Henrik > (author of aroma.affymetrix) > > On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen@imclone.com> wrote: > > Hi, > > > > You can try aroma.affymetrix, which is not a Bioconductor package yet. Or you can try the stand alone application RMAexpress as someone said he did a RMA on more than 10,000 cels with it. > > > > Ying > > > > -----Original Message----- > > From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Saif Ur-Rehman [guest] > > Sent: Monday, February 20, 2012 12:09 PM > > To: bioconductor@r-project.org; saif.urrehman@icr.ac.uk > > Subject: [BioC] Reading 5000 celfiles with ReadAffy > > > > > > Hi, > > > > A student in my institute is trying to normalise >5000 celfiles generated on the U133A platform using the affy BioConductor library. > > > > > > > > Attempting to read in this many files results in an error in allocating the matrix which is as follows. > > > > allocMatrix: too many elements specified. > > > > As there is plenty of memory allocated to R this was surprising. > > > > Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix specified by C which leads to this error. > > > > I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? > > > > Thank you in advance. > > > > Sincerely, > > Saif Ur-Rehman > > > > -- output of sessionInfo(): > > > > NA > > > > -- > > Sent via the guest posting facility at bioconductor.org. > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Ying. On Wed, Feb 22, 2012 at 7:16 AM, ying chen <ying_chen at="" live.com=""> wrote: > Hi Henrik, > > The regular machine does?not mean windows machine, right? No, it indeed means any machine on which you can install R, e.g. OSX, Linux and Windows. A good test is to see if you can install the affxparser package (on BioC). If so, you can also use aroma.affymetrix. > When I run aroma > on windows 7 64-bit machines, the problem is that the R GUI window always > freezes (Not Responding) when I tried to process more than 2000 cels. I > tried once with 1400 cels with aroma and it finished successfully :) I don't see why it should work with Rgui. Did you make sure to disable 'Misc -> Buffered output (Ctrl+W)' in Rgui? That way you will see all messages as they appear, and not when it completed, which may appear as a "freeze". If you use a high verbosity level in aroma, it will print lots of messages. Also, with the buffer enabled, it may be that it is overflows (which then would be a bug in Rgui). If this is not the cause, I'd be happy to learn more about your issues (please send a message to the aroma.affymetrix mailing list). FYI, I first started to develop aroma on Windows XP 32-bit w/ 1.5GB RAM. Now I'm on Windows 7 64-bit w/ 8GB RAM, but the design strategy is still to support machines with very little RAM (~500MB) as well as those with lots of RAM (e.g. 128GB). There are settings for specifying how much memory to occupy. [more below] > > Thanks, > > Ying > >> From: hb at biostat.ucsf.edu >> Date: Tue, 21 Feb 2012 15:48:04 -0800 >> To: Ying.Chen at imclone.com >> CC: saif.urrehman at icr.ac.uk; bioconductor at r-project.org >> Subject: Re: [BioC] Reading 5000 celfiles with ReadAffy > >> >> I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such >> CEL files on regular machine (say >2GB of RAM). The bet is open to >> the first 3 persons who challenge me (email me). I would be happy to >> raise the number to 100,000 CEL files, but that'll be hard to find ;) Finally, since a few people emailed me offline commenting on disk space available on "regular" machines, the quick answer is that you'll need ~220GB free disk space to process 20,000 HT_HG-U133A CEL files. Here are the details: Each HT_HG-U133A CEL file is ~5.5Mb. 20,000 such CEL files occupies ~105 GB of disk space. When running RMA, the aroma pipeline holds intermediate and final results on file, i.e. quantile-normalized data (as ~5.5Mb CEL files) and chip-effect estimates (as ~0.3Mb files). Thus, for each HT_HG-U133A array processed, one needs ~11.5Mb of disk space. (If one is willing to delete the raw data one can actually get by with ~5.8Mb per array). Thus, to do RMA on 20,000 HT_HG-U133A CEL files, you'll need ~220GB of disk space. I consider that fairly "regular" in today's standards. About RAM: you'll most likely will be able to get by with as little as 500MB of RAM. Here is what it looks like to estimate RMA chip effects (given that you've setup the correct aroma directory structure): # Run RMA ces <- doRMA("GSE24026", chipType="HT_HG-U133A"); For further question about aroma.affymetrix, please head over to http://aroma-project.org/forum/. /Henrik >> >> /Henrik >> (author of aroma.affymetrix) >> >> On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen at="" imclone.com=""> wrote: >> > Hi, >> > >> > You can try aroma.affymetrix, which is not a Bioconductor package yet. >> > Or you can try the stand alone application RMAexpress as someone said he did >> > a RMA on more than 10,000 cels with it. >> > >> > Ying >> > >> > -----Original Message----- >> > From: bioconductor-bounces at r-project.org >> > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Saif Ur-Rehman >> > [guest] >> > Sent: Monday, February 20, 2012 12:09 PM >> > To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk >> > Subject: [BioC] Reading 5000 celfiles with ReadAffy >> > >> > >> > Hi, >> > >> > A student in my institute is trying to normalise >5000 celfiles >> > generated on the U133A platform using the affy BioConductor library. >> > >> > >> > >> > Attempting to read in this many files results in an error in allocating >> > the matrix which is as follows. >> > >> > allocMatrix: too many elements specified. >> > >> > As there is plenty of memory allocated to R this was surprising. >> > >> > Some Googling showed that there is a hard limit of +2,147,483,647 in the >> > no. of columns in a matrix ?specified by C ?which leads to this error. >> > >> > I was just writing to ask if anyone had experience with normalisation of >> > a large no. of celfiles and had encountered this problem and if so what if >> > any solution you found? >> > >> > Thank you in advance. >> > >> > Sincerely, >> > Saif Ur-Rehman >> > >> > ?-- output of sessionInfo(): >> > >> > NA >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > Confidentiality Note:\ This e-mail, and any attachment >> > t...{{dropped:11}} >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Wednesday, February 22, 2012, Henrik Bengtsson wrote: > Hi Ying. > > On Wed, Feb 22, 2012 at 7:16 AM, ying chen <ying_chen@live.com<javascript:;>> > wrote: > > Hi Henrik, > > > > The regular machine does not mean windows machine, right? > > No, it indeed means any machine on which you can install R, e.g. OSX, > Linux and Windows. A good test is to see if you can install the > affxparser package (on BioC). If so, you can also use > aroma.affymetrix. > > > When I run aroma > > on windows 7 64-bit machines, the problem is that the R GUI window always > > freezes (Not Responding) when I tried to process more than 2000 cels. I > > tried once with 1400 cels with aroma and it finished successfully :) > > I don't see why it should work with Rgui. Did you make sure to > I don't see why it should *not* work with Rgui. /H > disable 'Misc -> Buffered output (Ctrl+W)' in Rgui? That way you will > see all messages as they appear, and not when it completed, which may > appear as a "freeze". If you use a high verbosity level in aroma, it > will print lots of messages. Also, with the buffer enabled, it may be > that it is overflows (which then would be a bug in Rgui). If this is > not the cause, I'd be happy to learn more about your issues (please > send a message to the aroma.affymetrix mailing list). > > FYI, I first started to develop aroma on Windows XP 32-bit w/ 1.5GB > RAM. Now I'm on Windows 7 64-bit w/ 8GB RAM, but the design strategy > is still to support machines with very little RAM (~500MB) as well as > those with lots of RAM (e.g. 128GB). There are settings for > specifying how much memory to occupy. > > [more below] > > > > > Thanks, > > > > Ying > > > >> From: hb@biostat.ucsf.edu <javascript:;> > >> Date: Tue, 21 Feb 2012 15:48:04 -0800 > >> To: Ying.Chen@imclone.com <javascript:;> > >> CC: saif.urrehman@icr.ac.uk <javascript:;>; bioconductor@r-project.org<javascript:;> > >> Subject: Re: [BioC] Reading 5000 celfiles with ReadAffy > > > >> > >> I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such > >> CEL files on regular machine (say >2GB of RAM). The bet is open to > >> the first 3 persons who challenge me (email me). I would be happy to > >> raise the number to 100,000 CEL files, but that'll be hard to find ;) > > Finally, since a few people emailed me offline commenting on disk > space available on "regular" machines, the quick answer is that you'll > need ~220GB free disk space to process 20,000 HT_HG-U133A CEL files. > Here are the details: Each HT_HG-U133A CEL file is ~5.5Mb. 20,000 such > CEL files occupies ~105 GB of disk space. When running RMA, the aroma > pipeline holds intermediate and final results on file, i.e. > quantile-normalized data (as ~5.5Mb CEL files) and chip-effect > estimates (as ~0.3Mb files). Thus, for each HT_HG-U133A array > processed, one needs ~11.5Mb of disk space. (If one is willing to > delete the raw data one can actually get by with ~5.8Mb per array). > Thus, to do RMA on 20,000 HT_HG-U133A CEL files, you'll need ~220GB of > disk space. I consider that fairly "regular" in today's standards. > About RAM: you'll most likely will be able to get by with as little as > 500MB of RAM. > > Here is what it looks like to estimate RMA chip effects (given that > you've setup the correct aroma directory structure): > > # Run RMA > ces <- doRMA("GSE24026", chipType="HT_HG-U133A"); > > For further question about aroma.affymetrix, please head over to > http://aroma-project.org/forum/. > > /Henrik > > >> > >> /Henrik > >> (author of aroma.affymetrix) > >> > >> On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen@imclone.com<javascript:;>> > wrote: > >> > Hi, > >> > > >> > You can try aroma.affymetrix, which is not a Bioconductor package yet. > >> > Or you can try the stand alone application RMAexpress as someone said > he did > >> > a RMA on more than 10,000 cels with it. > >> > > >> > Ying > >> > > >> > -----Original Message----- > >> > From: bioconductor-bounces@r-project.org <javascript:;> > >> > [mailto:bioconductor-bounces@r-project.org <javascript:;>] On Behalf > Of Saif Ur-Rehman > >> > [guest] > >> > Sent: Monday, February 20, 2012 12:09 PM > >> > To: bioconductor@r-project.org <javascript:;>; > saif.urrehman@icr.ac.uk <javascript:;> > >> > Subject: [BioC] Reading 5000 celfiles with ReadAffy > >> > > >> > > >> > Hi, > >> > > >> > A student in my institute is trying to normalise >5000 celfiles > >> > generated on the U133A platform using the affy BioConductor library. > >> > > >> > > >> > > >> > Attempting to read in this many files results in an error in > allocating > >> > the matrix which is as follows. > >> > > >> > allocMatrix: too many elements specified. > >> > > >> > As there is plenty of memory allocated to R this was surprising. > >> > > >> > Some Googling showed that there is a hard limit of +2,147,483,647 in > the > >> > no. of columns in a matrix specified by C which leads to this error. > >> > > >> > I was just writing to ask if anyone had experience with normalisation > of > >> > a large no. of celfiles and had encountered this problem and if so > what if > >> > any solution you found? > >> > > >> > Thank you in advance. > >> > > >> > Sincerely, > >> > Saif Ur-Rehman > >> > > >> > -- output of sessionInfo(): > >> > > >> > NA > >> > > >> > -- > >> > Sent via the guest posting facility at bioconductor.org. > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org <javascript:;> > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > Confidentiality Note:\ This e-mail, and any attachment > >> > t...{{dropped:11}} > >> > > >> > _______________________________________________ > >> > Bioconductor mailing list > >> > Bioconductor@r-project.org <javascript:;> > >> > https://stat.ethz.ch/mailman/listinfo/bioconductor > >> > Search the archives: > >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org <javascript:;> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Ying. On Wed, Feb 22, 2012 at 7:16 AM, ying chen <ying_chen at="" live.com=""> wrote: > Hi Henrik, > > The regular machine does?not mean windows machine, right? No, it indeed means any machine on which you can install R, e.g. OSX, Linux and Windows. A good test is to see if you can install the affxparser package (on BioC). If so, you can also use aroma.affymetrix. > When I run aroma > on windows 7 64-bit machines, the problem is that the R GUI window always > freezes (Not Responding) when I tried to process more than 2000 cels. I > tried once with 1400 cels with aroma and it finished successfully :) I don't see why it should work with Rgui. Did you make sure to disable 'Misc -> Buffered output (Ctrl+W)' in Rgui? That way you will see all messages as they appear, and not when it completed, which may appear as a "freeze". If you use a high verbosity level in aroma, it will print lots of messages. Also, with the buffer enabled, it may be that it is overflows (which then would be a bug in Rgui). If this is not the cause, I'd be happy to learn more about your issues (please send a message to the aroma.affymetrix mailing list). FYI, I first started to develop aroma on Windows XP 32-bit w/ 1.5GB RAM. Now I'm on Windows 7 64-bit w/ 8GB RAM, but the design strategy is still to support machines with very little RAM (~500MB) as well as those with lots of RAM (e.g. 128GB). There are settings for specifying how much memory to occupy. [more below] > > Thanks, > > Ying > >> From: hb at biostat.ucsf.edu >> Date: Tue, 21 Feb 2012 15:48:04 -0800 >> To: Ying.Chen at imclone.com >> CC: saif.urrehman at icr.ac.uk; bioconductor at r-project.org >> Subject: Re: [BioC] Reading 5000 celfiles with ReadAffy > >> >> I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such >> CEL files on regular machine (say >2GB of RAM). The bet is open to >> the first 3 persons who challenge me (email me). I would be happy to >> raise the number to 100,000 CEL files, but that'll be hard to find ;) Finally, since a few people emailed me offline commenting on disk space available on "regular" machines, the quick answer is that you'll need ~220GB free disk space to process 20,000 HT_HG-U133A CEL files. Here are the details: Each HT_HG-U133A CEL file is ~5.5Mb. 20,000 such CEL files occupies ~105 GB of disk space. When running RMA, the aroma pipeline holds intermediate and final results on file, i.e. quantile-normalized data (as ~5.5Mb CEL files) and chip-effect estimates (as ~0.3Mb files). Thus, for each HT_HG-U133A array processed, one needs ~11.5Mb of disk space. (If one is willing to delete the raw data one can actually get by with ~5.8Mb per array). Thus, to do RMA on 20,000 HT_HG-U133A CEL files, you'll need ~220GB of disk space. I consider that fairly "regular" in today's standards. About RAM: you'll most likely will be able to get by with as little as 500MB of RAM. Here is what it looks like to estimate RMA chip effects (given that you've setup the correct aroma directory structure): # Run RMA ces <- doRMA("GSE24026", chipType="HT_HG-U133A"); For further question about aroma.affymetrix, please head over to http://aroma-project.org/forum/. /Henrik >> >> /Henrik >> (author of aroma.affymetrix) >> >> On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen <ying.chen at="" imclone.com=""> wrote: >> > Hi, >> > >> > You can try aroma.affymetrix, which is not a Bioconductor package yet. >> > Or you can try the stand alone application RMAexpress as someone said he did >> > a RMA on more than 10,000 cels with it. >> > >> > Ying >> > >> > -----Original Message----- >> > From: bioconductor-bounces at r-project.org >> > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Saif Ur-Rehman >> > [guest] >> > Sent: Monday, February 20, 2012 12:09 PM >> > To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk >> > Subject: [BioC] Reading 5000 celfiles with ReadAffy >> > >> > >> > Hi, >> > >> > A student in my institute is trying to normalise >5000 celfiles >> > generated on the U133A platform using the affy BioConductor library. >> > >> > >> > >> > Attempting to read in this many files results in an error in allocating >> > the matrix which is as follows. >> > >> > allocMatrix: too many elements specified. >> > >> > As there is plenty of memory allocated to R this was surprising. >> > >> > Some Googling showed that there is a hard limit of +2,147,483,647 in the >> > no. of columns in a matrix ?specified by C ?which leads to this error. >> > >> > I was just writing to ask if anyone had experience with normalisation of >> > a large no. of celfiles and had encountered this problem and if so what if >> > any solution you found? >> > >> > Thank you in advance. >> > >> > Sincerely, >> > Saif Ur-Rehman >> > >> > ?-- output of sessionInfo(): >> > >> > NA >> > >> > -- >> > Sent via the guest posting facility at bioconductor.org. >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > Confidentiality Note:\ This e-mail, and any attachment >> > t...{{dropped:11}} >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
To reply to Henrik's challenge. As Paul mentioned xps is able to process this amount of CEL-files. In fact quite some time ago a user of xps was able to process 23,000 CEL-files of HG-U133_Plus_2 arrays. As far as I remember memory usage was about 2-2.5 GB RAM. Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 2/22/12 12:48 AM, Henrik Bengtsson wrote: > I bet a dinner w/ drinks that aroma.affymetrix can process 20,000 such > CEL files on regular machine (say>2GB of RAM). The bet is open to > the first 3 persons who challenge me (email me). I would be happy to > raise the number to 100,000 CEL files, but that'll be hard to find ;) > > /Henrik > (author of aroma.affymetrix) > > On Tue, Feb 21, 2012 at 5:49 AM, Ying Chen<ying.chen at="" imclone.com=""> wrote: >> Hi, >> >> You can try aroma.affymetrix, which is not a Bioconductor package yet. Or you can try the stand alone application RMAexpress as someone said he did a RMA on more than 10,000 cels with it. >> >> Ying >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Saif Ur-Rehman [guest] >> Sent: Monday, February 20, 2012 12:09 PM >> To: bioconductor at r-project.org; saif.urrehman at icr.ac.uk >> Subject: [BioC] Reading 5000 celfiles with ReadAffy >> >> >> Hi, >> >> A student in my institute is trying to normalise>5000 celfiles generated on the U133A platform using the affy BioConductor library. >> >> >> >> Attempting to read in this many files results in an error in allocating the matrix which is as follows. >> >> allocMatrix: too many elements specified. >> >> As there is plenty of memory allocated to R this was surprising. >> >> Some Googling showed that there is a hard limit of +2,147,483,647 in the no. of columns in a matrix specified by C which leads to this error. >> >> I was just writing to ask if anyone had experience with normalisation of a large no. of celfiles and had encountered this problem and if so what if any solution you found? >> >> Thank you in advance. >> >> Sincerely, >> Saif Ur-Rehman >> >> -- output of sessionInfo(): >> >> NA >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> Confidentiality Note:\ This e-mail, and any attachment t...{{dropped:11}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6