large amount of slides

0

Entering edit mode

Vada Wilcox ▴ 10

@vada-wilcox-795

Last seen 10.7 years ago

Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

• 1.5k views

ADD COMMENT • link updated 20.9 years ago by Marcus Davy ▴ 680 • written 20.9 years ago by Vada Wilcox ▴ 10

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 10.7 years ago

This is what I do : 1. Randomly split into manageable chuncks of say 4 batches of 50 (depends on computer) 2. Do RMA on these batches separately 3. Combine these 4 batches (e.g. cbind/merge) into one finalised dataset 4. Repeat for B times and take the average of B datasets >From past experience, the coefficient of variation is less than 0.03 for 99% of probesets if you use B = 20 - 30. If you like I can send my perl wrapper script that does this. This is based on the assumption you can submit multiple jobs (e.g. clusters or big server) but you can easily modify it. I don't know much about increasing RAM. You can try just.rma( ..., destructive=TRUE) but I am not sure if this uses significantly less RAM. Regards, Adai. On Fri, 2004-06-04 at 16:06, Vada Wilcox wrote: > Dear all, > > I have been using RMA succesfully for a while now, but in the past I have > only used it on a small amount of slides. I would like to do my study on a > larger scale now, with data (series of experiments) from other researchers > as well. My questions is the following: if I want to study, let's say 200 > slides, do I have to read them all into R at once (so together I mean, with > read.affy() in package affy), or is it OK to read them series by series (so > all wild types and controls of one researcher at a time)? > > If it is really necessary to read all of them in at one time how much RAM > would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now > it's possible to raise it by using 'max vsize = ...' but I haven't been able > to do it succesfully for 200 experiments though. Can somebody help me on > this? > > Many thanks in advance, > > Vada > > _________________________________________________________________ > > http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.9 years ago Adaikalavan Ramasamy ★ 1.8k

0

Entering edit mode

Park, Richard ▴ 220

@park-richard-227

Last seen 10.7 years ago

Hi Vada, I would caution you on doing rma on that many datasets. I have noticed a trend in rma, that things get even more underestimated as the number and variance of the data increases. I have been doing an analysis on immune cell types for about 100 cel files. My computer (windows 2000, 2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty sure that my problem is that windows 2000 has a maximum allocation of 1gb. But if most of your data is pretty related (i.e. same tissues, just a ko vs wt) you should be fine w/ rma. I would caution against using rma on data that is very different. hth, richard -----Original Message----- From: Vada Wilcox [mailto:v_wilcox@hotmail.com] Sent: Friday, June 04, 2004 11:06 AM To: bioconductor@stat.math.ethz.ch Subject: [BioC] large amount of slides Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.9 years ago Park, Richard ▴ 220

0

Entering edit mode

Roel Verhaak ▴ 70

@roel-verhaak-710

Last seen 10.7 years ago

I have succesfully ran GCRMA on a dataset of 285 HGU133a chips, on a machine with 8 Gb RAM installed; I noticed a peak memory use of 5,5 Gb (although I have not been monitoring it continuously). I would say 200 chips use equally less memory, so around 4 Gb. Roel Verhaak > > Message: 9 > Date: Fri, 04 Jun 2004 10:06:14 -0500 > From: "Vada Wilcox" <v_wilcox@hotmail.com> > Subject: [BioC] large amount of slides > To: bioconductor@stat.math.ethz.ch > Message-ID: <bay19-f34sdgaixwb9d0002ec89@hotmail.com> > Content-Type: text/plain; format=flowed > > Dear all, > > I have been using RMA succesfully for a while now, but in the past I have > only used it on a small amount of slides. I would like to do my study on a > larger scale now, with data (series of experiments) from other researchers > as well. My questions is the following: if I want to study, let's say 200 > slides, do I have to read them all into R at once (so together I mean, > with > read.affy() in package affy), or is it OK to read them series by series > (so > all wild types and controls of one researcher at a time)? > If it is really necessary to read all of them in at one time how much RAM > would I need (for let's say 200 CELfiles) and how can I raise the RAM? I > now > it's possible to raise it by using 'max vsize = ...' but I haven't been > able > to do it succesfully for 200 experiments though. Can somebody help me on > this? >

ADD COMMENT • link 20.9 years ago Roel Verhaak ▴ 70

0

Entering edit mode

Marcus Davy ▴ 680

@marcus-davy-374

Last seen 10.7 years ago

Hi, you can use the function object.size to estimate the the storage of any expression set objects. e.g. > object.size(affybatch.example) [1] 243384 > dim(exprs(affybatch.example)) [1] 10000 3 > object.size(exprs(affybatch.example)) [1] 240280 > object.size(exprs(affybatch.example)) / (nrow(exprs(affybatch.example))*ncol(exprs(affybatch.example))) [1] 8.009333 Each matrix double precision value should take 8 bytes of storage, so you can estimate the amount of memory required for n genes by 200 arrays plus annotation information etc. On a *standard* windows XP (or 2000) machine running R 1.9.0 you can increase the addressable memory space with the --max-mem-size=2G arguement when you run the executable, details are in the windows FAQ. Check it has increased with; >memory.limit() [1] 2147483648 Memory intensive algorithms could start running out of addressable memory on some 32-bit architectures for large datasets, e.g. Bioconductors siggenes sam permutation testing function with B=1000, on 27000 genes is likely to have problems on some 32-bit platforms depending on physical memory and the virtual page size available to the operating system. marcus >>> "Park, Richard" <richard.park@joslin.harvard.edu> 5/06/2004 3:40:42 AM >>> Hi Vada, I would caution you on doing rma on that many datasets. I have noticed a trend in rma, that things get even more underestimated as the number and variance of the data increases. I have been doing an analysis on immune cell types for about 100 cel files. My computer (windows 2000, 2gb of ram, 2.6 pentium 4) gives out around 70 datasets, I am pretty sure that my problem is that windows 2000 has a maximum allocation of 1gb. But if most of your data is pretty related (i.e. same tissues, just a ko vs wt) you should be fine w/ rma. I would caution against using rma on data that is very different. hth, richard -----Original Message----- From: Vada Wilcox [mailto:v_wilcox@hotmail.com] Sent: Friday, June 04, 2004 11:06 AM To: bioconductor@stat.math.ethz.ch Subject: [BioC] large amount of slides Dear all, I have been using RMA succesfully for a while now, but in the past I have only used it on a small amount of slides. I would like to do my study on a larger scale now, with data (series of experiments) from other researchers as well. My questions is the following: if I want to study, let's say 200 slides, do I have to read them all into R at once (so together I mean, with read.affy() in package affy), or is it OK to read them series by series (so all wild types and controls of one researcher at a time)? If it is really necessary to read all of them in at one time how much RAM would I need (for let's say 200 CELfiles) and how can I raise the RAM? I now it's possible to raise it by using 'max vsize = ...' but I haven't been able to do it succesfully for 200 experiments though. Can somebody help me on this? Many thanks in advance, Vada _________________________________________________________________ http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}

ADD COMMENT • link 20.9 years ago Marcus Davy ▴ 680

Login before adding your answer.