Question on importing large dataset (1.4GB) into R-Bioconductor
2
0
Entering edit mode
Anqi ▴ 40
@anqi-3575
Last seen 10.1 years ago
To whom it may concern, I am a student from Peking University, China. I am currently doing some microarray data analysis research with Bioconductor package of R. Problem arises when I try to import into R my dataset which contains 109 samples (total size more than 1.4 GB). The memory limit of R makes importing all the samples into one AffyBatch object a "mission impossible" for me. Though it will be possible to import data into several AffyBatch objects, and do the preprocessing respectively. Yet in this case, the results of background correction or normalization are not desirable, because not all the information known (namely 109 samples) is used to obtain a baseline or something like that. An alternative approach would be to pre-process the data in dChip, and then export it into R. Yet I am thinking about an approach that relies solely on R. Would you please give some suggestions on this issue, though it might be more a technical problem than a scientific (statistical) one? Much thanks for your help! look forward to your reply! All the best to your work! Best regards, Anqi [[alternative HTML version deleted]]
Microarray Normalization Preprocessing Microarray Normalization Preprocessing • 1.1k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 7 weeks ago
United States
On Thu, Jul 16, 2009 at 12:14 AM, Anqi <dotzaq@126.com> wrote: > To whom it may concern, > I am a student from Peking University, China. I am currently doing some > microarray data analysis research with Bioconductor package of R. > > Problem arises when I try to import into R my dataset which contains 109 > samples (total size more than 1.4 GB). The memory limit of R makes importing > all the samples into one AffyBatch object a "mission impossible" for me. > > Though it will be possible to import data into several AffyBatch objects, > and do the preprocessing respectively. Yet in this case, the results of > background correction or normalization are not desirable, because not all > the information known (namely 109 samples) is used to obtain a baseline or > something like that. > > An alternative approach would be to pre-process the data in dChip, and then > export it into R. Yet I am thinking about an approach that relies solely on > R. > > Would you please give some suggestions on this issue, though it might be > more a technical problem than a scientific (statistical) one? Much thanks > for your help! look forward to your reply! All the best to your work! > > You could try using the xps or aroma.affymetrix packages. I think both are designed to deal with large datasets. Sean [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 7 weeks ago
United States
2009/7/16 Anqi <dotzaq@126.com> > Hi Sean, > I have tried my dataset out in the aroma.affymetrix package and it DOES > work. Thanks so much for your help! > > Glad to hear that did the trick. Sean > Best, > Anqi > > > 在2009-07-16 12:19:29,"Sean Davis" <seandavi@gmail.com> 写道: > > > > On Thu, Jul 16, 2009 at 12:14 AM, Anqi <dotzaq@126.com> wrote: > >> To whom it may concern, >> I am a student from Peking University, China. I am currently doing some >> microarray data analysis research with Bioconductor package of R. >> >> Problem arises when I try to import into R my dataset which contains 109 >> samples (total size more than 1.4 GB). The memory limit of R makes importing >> all the samples into one AffyBatch object a "mission impossible" for me. >> >> Though it will be possible to import data into several AffyBatch objects, >> and do the preprocessing respectively. Yet in this case, the results of >> background correction or normalization are not desirable, because not all >> the information known (namely 109 samples) is used to obtain a baseline or >> something like that. >> >> An alternative approach would be to pre-process the data in dChip, and >> then export it into R. Yet I am thinking about an approach that relies >> solely on R. >> >> Would you please give some suggestions on this issue, though it might be >> more a technical problem than a scientific (statistical) one? Much thanks >> for your help! look forward to your reply! All the best to your work! >> >> > You could try using the xps or aroma.affymetrix packages. I think both are > designed to deal with large datasets. > > Sean > > > > > ------------------------------ > 200万种商品,最低价格,疯狂诱惑你<http: count.mail.163.c="" om="" redirect="" footer.htm?f="&lt;a href=" http:="" gouwu.youdao.com"="" rel="nofollow">http://gouwu.youdao.com"> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 1029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6