Question

Question on importing large dataset (1.4GB) into R-Bioconductor

0

Entering edit mode

Anqi ▴ 40

@anqi-3575

Last seen 10.5 years ago

To whom it may concern, I am a student from Peking University, China. I am currently doing some microarray data analysis research with Bioconductor package of R. Problem arises when I try to import into R my dataset which contains 109 samples (total size more than 1.4 GB). The memory limit of R makes importing all the samples into one AffyBatch object a "mission impossible" for me. Though it will be possible to import data into several AffyBatch objects, and do the preprocessing respectively. Yet in this case, the results of background correction or normalization are not desirable, because not all the information known (namely 109 samples) is used to obtain a baseline or something like that. An alternative approach would be to pre-process the data in dChip, and then export it into R. Yet I am thinking about an approach that relies solely on R. Would you please give some suggestions on this issue, though it might be more a technical problem than a scientific (statistical) one? Much thanks for your help! look forward to your reply! All the best to your work! Best regards, Anqi [[alternative HTML version deleted]]

Microarray Normalization Preprocessing Microarray Normalization Preprocessing • 1.2k views

ADD COMMENT • link updated 15.6 years ago by Sean Davis 21k • written 15.6 years ago by Anqi ▴ 40

score 0 · Answer 1 · 2009-07-15

On Thu, Jul 16, 2009 at 12:14 AM, Anqi <dotzaq@126.com> wrote: > To whom it may concern, > I am a student from Peking University, China. I am currently doing some > microarray data analysis research with Bioconductor package of R. > > Problem arises when I try to import into R my dataset which contains 109 > samples (total size more than 1.4 GB). The memory limit of R makes importing > all the samples into one AffyBatch object a "mission impossible" for me. > > Though it will be possible to import data into several AffyBatch objects, > and do the preprocessing respectively. Yet in this case, the results of > background correction or normalization are not desirable, because not all > the information known (namely 109 samples) is used to obtain a baseline or > something like that. > > An alternative approach would be to pre-process the data in dChip, and then > export it into R. Yet I am thinking about an approach that relies solely on > R. > > Would you please give some suggestions on this issue, though it might be > more a technical problem than a scientific (statistical) one? Much thanks > for your help! look forward to your reply! All the best to your work! > > You could try using the xps or aroma.affymetrix packages. I think both are designed to deal with large datasets. Sean [[alternative HTML version deleted]]

score 0 · Answer 2 · 2009-07-16

2009/7/16 Anqi <dotzaq@126.com> > Hi Sean, > I have tried my dataset out in the aroma.affymetrix package and it DOES > work. Thanks so much for your help! > > Glad to hear that did the trick. Sean > Best, > Anqi > > > å¨2009-07-16 12:19:29ï¼"Sean Davis" <seandavi@gmail.com> åéï¼ > > > > On Thu, Jul 16, 2009 at 12:14 AM, Anqi <dotzaq@126.com> wrote: > >> To whom it may concern, >> I am a student from Peking University, China. I am currently doing some >> microarray data analysis research with Bioconductor package of R. >> >> Problem arises when I try to import into R my dataset which contains 109 >> samples (total size more than 1.4 GB). The memory limit of R makes importing >> all the samples into one AffyBatch object a "mission impossible" for me. >> >> Though it will be possible to import data into several AffyBatch objects, >> and do the preprocessing respectively. Yet in this case, the results of >> background correction or normalization are not desirable, because not all >> the information known (namely 109 samples) is used to obtain a baseline or >> something like that. >> >> An alternative approach would be to pre-process the data in dChip, and >> then export it into R. Yet I am thinking about an approach that relies >> solely on R. >> >> Would you please give some suggestions on this issue, though it might be >> more a technical problem than a scientific (statistical) one? Much thanks >> for your help! look forward to your reply! All the best to your work! >> >> > You could try using the xps or aroma.affymetrix packages. I think both are > designed to deal with large datasets. > > Sean > > > > > ------------------------------ > 200ä¸ç§åå,æä½ä»·æ ¼,ç¯çè¯±æä½ <http: count.mail.163.c="" om="" redirect="" footer.htm?f="<a href=" http:="" gouwu.youdao.com"="" rel="nofollow">http://gouwu.youdao.com"> [[alternative HTML version deleted]]