segfault ReadAffy cause 'memory not mapped'
2
0
Entering edit mode
Loraine, Ann ▴ 50
@loraine-ann-5437
Last seen 10.2 years ago
Hello, I am trying to process several thousand CEL files using the ReadAffy command. The machine has 96 Gb RAM. However I get this error: > expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames=d. uniq$gsm,compress=T) *** caught segfault *** address 0x7fc79b4b1048, cause 'memory not mapped' Traceback: 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, description = l$description, notes = notes, compress = compress, rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = verbose, sd = sd, cdfname = cdfname) 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", sampleNames = d.uniq$gsm, compress = T) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: R and session info: R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 [4] zlibbioc_1.6.0 Can you help? Best, Ann [[alternative HTML version deleted]]
PROcess PROcess • 2.8k views
ADD COMMENT
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 6.2 years ago
Austria
Dear Ann, Several thousand CEL-files is quite a lot. Furthermore, you do not mention which array type are you using. In any case you could try to use package 'xps', which should be able to handle it. However, you should do your processing stepwise. Best regards, Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 8/1/13 5:33 PM, Loraine, Ann wrote: > Hello, > > I am trying to process several thousand CEL files using the ReadAffy command. > > The machine has 96 Gb RAM. > > However I get this error: > >> expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames=d .uniq$gsm,compress=T) > > *** caught segfault *** > address 0x7fc79b4b1048, cause 'memory not mapped' > > Traceback: > 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, description = l$description, notes = notes, compress = compress, rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = verbose, sd = sd, cdfname = cdfname) > 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", sampleNames = d.uniq$gsm, compress = T) > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > Selection: > > R and session info: > > R version 3.0.1 (2013-05-16) -- "Good Sport" > Copyright (C) 2013 The R Foundation for Statistical Computing > Platform: x86_64-unknown-linux-gnu (64-bit) > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 > [4] zlibbioc_1.6.0 > > Can you help? > > Best, > > Ann > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On 08/02/2013 11:00 AM, cstrato wrote: > Dear Ann, > > Several thousand CEL-files is quite a lot. Furthermore, you do not mention which > array type are you using. > > In any case you could try to use package 'xps', which should be able to handle > it. However, you should do your processing stepwise. also, perhaps justRMA is a more memory-efficient way to do standard normalization. Probably there is a bug in ReadAffy, but it would be difficult to track down without a more reproducible example. Martin > > Best regards, > Christian > _._._._._._._._._._._._._._._._._._ > C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > V.i.e.n.n.a A.u.s.t.r.i.a > e.m.a.i.l: cstrato at aon.at > _._._._._._._._._._._._._._._._._._ > > > > On 8/1/13 5:33 PM, Loraine, Ann wrote: >> Hello, >> >> I am trying to process several thousand CEL files using the ReadAffy command. >> >> The machine has 96 Gb RAM. >> >> However I get this error: >> >>> expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames= d.uniq$gsm,compress=T) >>> >> >> *** caught segfault *** >> address 0x7fc79b4b1048, cause 'memory not mapped' >> >> Traceback: >> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, >> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") >> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, >> description = l$description, notes = notes, compress = compress, rm.mask = >> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = >> verbose, sd = sd, cdfname = cdfname) >> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", sampleNames = >> d.uniq$gsm, compress = T) >> >> Possible actions: >> 1: abort (with core dump, if enabled) >> 2: normal R exit >> 3: exit R without saving workspace >> 4: exit R saving workspace >> Selection: >> >> R and session info: >> >> R version 3.0.1 (2013-05-16) -- "Good Sport" >> Copyright (C) 2013 The R Foundation for Statistical Computing >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >>> sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >> [4] zlibbioc_1.6.0 >> >> Can you help? >> >> Best, >> >> Ann >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
ReadAffy constructs a matrix which is probes x samples. If the array is big enough, perhaps several thousand samples is enough to push a matrix over the internal R limits. Which array is this? Kasper On Fri, Aug 2, 2013 at 4:17 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 08/02/2013 11:00 AM, cstrato wrote: > >> Dear Ann, >> >> Several thousand CEL-files is quite a lot. Furthermore, you do not >> mention which >> array type are you using. >> >> In any case you could try to use package 'xps', which should be able to >> handle >> it. However, you should do your processing stepwise. >> > > also, perhaps justRMA is a more memory-efficient way to do standard > normalization. Probably there is a bug in ReadAffy, but it would be > difficult to track down without a more reproducible example. Martin > > > >> Best regards, >> Christian >> _._._._._._._._._._._._._._._.**_._._ >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >> V.i.e.n.n.a A.u.s.t.r.i.a >> e.m.a.i.l: cstrato at aon.at >> _._._._._._._._._._._._._._._.**_._._ >> >> >> >> On 8/1/13 5:33 PM, Loraine, Ann wrote: >> >>> Hello, >>> >>> I am trying to process several thousand CEL files using the ReadAffy >>> command. >>> >>> The machine has 96 Gb RAM. >>> >>> However I get this error: >>> >>> expr=ReadAffy(filenames=d.**uniq$cel,celfile.path='CEL',** >>>> sampleNames=d.uniq$gsm,**compress=T) >>>> >>>> >>> *** caught segfault *** >>> address 0x7fc79b4b1048, cause 'memory not mapped' >>> >>> Traceback: >>> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, >>> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") >>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, >>> description = l$description, notes = notes, compress = compress, >>> rm.mask = >>> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = >>> verbose, sd = sd, cdfname = cdfname) >>> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", sampleNames = >>> d.uniq$gsm, compress = T) >>> >>> Possible actions: >>> 1: abort (with core dump, if enabled) >>> 2: normal R exit >>> 3: exit R without saving workspace >>> 4: exit R saving workspace >>> Selection: >>> >>> R and session info: >>> >>> R version 3.0.1 (2013-05-16) -- "Good Sport" >>> Copyright (C) 2013 The R Foundation for Statistical Computing >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> sessionInfo() >>>> >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >>> [4] zlibbioc_1.6.0 >>> >>> Can you help? >>> >>> Best, >>> >>> Ann >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**conduct or<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Also worth trying the standalone and free software called affy power tools if its affy arrays. There is an option to choose sketch normalization. On 2 Aug 2013 21:34, "Kasper Daniel Hansen" <kasperdanielhansen@gmail.com> wrote: > ReadAffy constructs a matrix which is probes x samples. If the array is > big enough, perhaps several thousand samples is enough to push a matrix > over the internal R limits. > > Which array is this? > > Kasper > > > On Fri, Aug 2, 2013 at 4:17 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > > > On 08/02/2013 11:00 AM, cstrato wrote: > > > >> Dear Ann, > >> > >> Several thousand CEL-files is quite a lot. Furthermore, you do not > >> mention which > >> array type are you using. > >> > >> In any case you could try to use package 'xps', which should be able to > >> handle > >> it. However, you should do your processing stepwise. > >> > > > > also, perhaps justRMA is a more memory-efficient way to do standard > > normalization. Probably there is a bug in ReadAffy, but it would be > > difficult to track down without a more reproducible example. Martin > > > > > > > >> Best regards, > >> Christian > >> _._._._._._._._._._._._._._._.**_._._ > >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a > >> V.i.e.n.n.a A.u.s.t.r.i.a > >> e.m.a.i.l: cstrato at aon.at > >> _._._._._._._._._._._._._._._.**_._._ > >> > >> > >> > >> On 8/1/13 5:33 PM, Loraine, Ann wrote: > >> > >>> Hello, > >>> > >>> I am trying to process several thousand CEL files using the ReadAffy > >>> command. > >>> > >>> The machine has 96 Gb RAM. > >>> > >>> However I get this error: > >>> > >>> expr=ReadAffy(filenames=d.**uniq$cel,celfile.path='CEL',** > >>>> sampleNames=d.uniq$gsm,**compress=T) > >>>> > >>>> > >>> *** caught segfault *** > >>> address 0x7fc79b4b1048, cause 'memory not mapped' > >>> > >>> Traceback: > >>> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, > >>> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > >>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, > >>> description = l$description, notes = notes, compress = compress, > >>> rm.mask = > >>> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = > >>> verbose, sd = sd, cdfname = cdfname) > >>> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", > sampleNames = > >>> d.uniq$gsm, compress = T) > >>> > >>> Possible actions: > >>> 1: abort (with core dump, if enabled) > >>> 2: normal R exit > >>> 3: exit R without saving workspace > >>> 4: exit R saving workspace > >>> Selection: > >>> > >>> R and session info: > >>> > >>> R version 3.0.1 (2013-05-16) -- "Good Sport" > >>> Copyright (C) 2013 The R Foundation for Statistical Computing > >>> Platform: x86_64-unknown-linux-gnu (64-bit) > >>> > >>> sessionInfo() > >>>> > >>> R version 3.0.1 (2013-05-16) > >>> Platform: x86_64-unknown-linux-gnu (64-bit) > >>> > >>> locale: > >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >>> [7] LC_PAPER=C LC_NAME=C > >>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >>> > >>> attached base packages: > >>> [1] parallel stats graphics grDevices utils datasets methods > >>> [8] base > >>> > >>> other attached packages: > >>> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 > >>> [4] zlibbioc_1.6.0 > >>> > >>> Can you help? > >>> > >>> Best, > >>> > >>> Ann > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________**_________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > >>> Search the archives: > >>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< > http://news.gmane.org/gmane.science.biology.informatics.conductor> > >>> > >>> > >> ______________________________**_________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > >> Search the archives: > >> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< > http://news.gmane.org/gmane.science.biology.informatics.conductor> > >> > > > > > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > > > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > > Search the archives: http://news.gmane.org/gmane.** > > science.biology.informatics.**conductor< > http://news.gmane.org/gmane.science.biology.informatics.conductor> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Another option if your data comes from one of the more common affymetrix platforms (HGU133a, HGU133plus2, Mouse4302, HuGene, HuEx), you can read in and preprocess your data in subsets using frma and then combine it for further analysis. Best, Matt On Fri, Aug 2, 2013 at 4:59 PM, Adaikalavan Ramasamy <adaikalavan.ramasamy at="" gmail.com=""> wrote: > Also worth trying the standalone and free software called affy power tools > if its affy arrays. There is an option to choose sketch normalization. > > On 2 Aug 2013 21:34, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com=""> > wrote: > >> ReadAffy constructs a matrix which is probes x samples. If the array is >> big enough, perhaps several thousand samples is enough to push a matrix >> over the internal R limits. >> >> Which array is this? >> >> Kasper >> >> >> On Fri, Aug 2, 2013 at 4:17 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> >> > On 08/02/2013 11:00 AM, cstrato wrote: >> > >> >> Dear Ann, >> >> >> >> Several thousand CEL-files is quite a lot. Furthermore, you do not >> >> mention which >> >> array type are you using. >> >> >> >> In any case you could try to use package 'xps', which should be able to >> >> handle >> >> it. However, you should do your processing stepwise. >> >> >> > >> > also, perhaps justRMA is a more memory-efficient way to do standard >> > normalization. Probably there is a bug in ReadAffy, but it would be >> > difficult to track down without a more reproducible example. Martin >> > >> > >> > >> >> Best regards, >> >> Christian >> >> _._._._._._._._._._._._._._._.**_._._ >> >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >> >> V.i.e.n.n.a A.u.s.t.r.i.a >> >> e.m.a.i.l: cstrato at aon.at >> >> _._._._._._._._._._._._._._._.**_._._ >> >> >> >> >> >> >> >> On 8/1/13 5:33 PM, Loraine, Ann wrote: >> >> >> >>> Hello, >> >>> >> >>> I am trying to process several thousand CEL files using the ReadAffy >> >>> command. >> >>> >> >>> The machine has 96 Gb RAM. >> >>> >> >>> However I get this error: >> >>> >> >>> expr=ReadAffy(filenames=d.**uniq$cel,celfile.path='CEL',** >> >>>> sampleNames=d.uniq$gsm,**compress=T) >> >>>> >> >>>> >> >>> *** caught segfault *** >> >>> address 0x7fc79b4b1048, cause 'memory not mapped' >> >>> >> >>> Traceback: >> >>> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, >> >>> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") >> >>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, >> >>> description = l$description, notes = notes, compress = compress, >> >>> rm.mask = >> >>> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = >> >>> verbose, sd = sd, cdfname = cdfname) >> >>> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", >> sampleNames = >> >>> d.uniq$gsm, compress = T) >> >>> >> >>> Possible actions: >> >>> 1: abort (with core dump, if enabled) >> >>> 2: normal R exit >> >>> 3: exit R without saving workspace >> >>> 4: exit R saving workspace >> >>> Selection: >> >>> >> >>> R and session info: >> >>> >> >>> R version 3.0.1 (2013-05-16) -- "Good Sport" >> >>> Copyright (C) 2013 The R Foundation for Statistical Computing >> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >> >>> >> >>> sessionInfo() >> >>>> >> >>> R version 3.0.1 (2013-05-16) >> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >> >>> >> >>> locale: >> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >>> [7] LC_PAPER=C LC_NAME=C >> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >>> >> >>> attached base packages: >> >>> [1] parallel stats graphics grDevices utils datasets methods >> >>> [8] base >> >>> >> >>> other attached packages: >> >>> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 >> >>> >> >>> loaded via a namespace (and not attached): >> >>> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >> >>> [4] zlibbioc_1.6.0 >> >>> >> >>> Can you help? >> >>> >> >>> Best, >> >>> >> >>> Ann >> >>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> ______________________________**_________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >> https://stat.ethz.ch/mailman/listinfo/bioconductor> >> >>> Search the archives: >> >>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >> http://news.gmane.org/gmane.science.biology.informatics.conductor> >> >>> >> >>> >> >> ______________________________**_________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >> https://stat.ethz.ch/mailman/listinfo/bioconductor> >> >> Search the archives: >> >> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >> http://news.gmane.org/gmane.science.biology.informatics.conductor> >> >> >> > >> > >> > -- >> > Computational Biology / Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N. >> > PO Box 19024 Seattle, WA 98109 >> > >> > Location: Arnold Building M1 B861 >> > Phone: (206) 667-2793 >> > >> > >> > ______________________________**_________________ >> > Bioconductor mailing list >> > Bioconductor at r-project.org >> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >> https://stat.ethz.ch/mailman/listinfo/bioconductor> >> > Search the archives: http://news.gmane.org/gmane.** >> > science.biology.informatics.**conductor< >> http://news.gmane.org/gmane.science.biology.informatics.conductor> >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Matthew N McCall, PhD 112 Arvine Heights Rochester, NY 14611 Cell: 202-222-5880
ADD REPLY
0
Entering edit mode
Alright, my turn... http://aroma-project.org/ /Henrik On Fri, Aug 2, 2013 at 2:57 PM, Matthew McCall <mccallm at="" gmail.com=""> wrote: > Another option if your data comes from one of the more common > affymetrix platforms (HGU133a, HGU133plus2, Mouse4302, HuGene, HuEx), > you can read in and preprocess your data in subsets using frma and > then combine it for further analysis. > > Best, > Matt > > On Fri, Aug 2, 2013 at 4:59 PM, Adaikalavan Ramasamy > <adaikalavan.ramasamy at="" gmail.com=""> wrote: >> Also worth trying the standalone and free software called affy power tools >> if its affy arrays. There is an option to choose sketch normalization. >> >> On 2 Aug 2013 21:34, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com=""> >> wrote: >> >>> ReadAffy constructs a matrix which is probes x samples. If the array is >>> big enough, perhaps several thousand samples is enough to push a matrix >>> over the internal R limits. >>> >>> Which array is this? >>> >>> Kasper >>> >>> >>> On Fri, Aug 2, 2013 at 4:17 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>> >>> > On 08/02/2013 11:00 AM, cstrato wrote: >>> > >>> >> Dear Ann, >>> >> >>> >> Several thousand CEL-files is quite a lot. Furthermore, you do not >>> >> mention which >>> >> array type are you using. >>> >> >>> >> In any case you could try to use package 'xps', which should be able to >>> >> handle >>> >> it. However, you should do your processing stepwise. >>> >> >>> > >>> > also, perhaps justRMA is a more memory-efficient way to do standard >>> > normalization. Probably there is a bug in ReadAffy, but it would be >>> > difficult to track down without a more reproducible example. Martin >>> > >>> > >>> > >>> >> Best regards, >>> >> Christian >>> >> _._._._._._._._._._._._._._._.**_._._ >>> >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>> >> V.i.e.n.n.a A.u.s.t.r.i.a >>> >> e.m.a.i.l: cstrato at aon.at >>> >> _._._._._._._._._._._._._._._.**_._._ >>> >> >>> >> >>> >> >>> >> On 8/1/13 5:33 PM, Loraine, Ann wrote: >>> >> >>> >>> Hello, >>> >>> >>> >>> I am trying to process several thousand CEL files using the ReadAffy >>> >>> command. >>> >>> >>> >>> The machine has 96 Gb RAM. >>> >>> >>> >>> However I get this error: >>> >>> >>> >>> expr=ReadAffy(filenames=d.**uniq$cel,celfile.path='CEL',** >>> >>>> sampleNames=d.uniq$gsm,**compress=T) >>> >>>> >>> >>>> >>> >>> *** caught segfault *** >>> >>> address 0x7fc79b4b1048, cause 'memory not mapped' >>> >>> >>> >>> Traceback: >>> >>> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, >>> >>> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") >>> >>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, >>> >>> description = l$description, notes = notes, compress = compress, >>> >>> rm.mask = >>> >>> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = >>> >>> verbose, sd = sd, cdfname = cdfname) >>> >>> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", >>> sampleNames = >>> >>> d.uniq$gsm, compress = T) >>> >>> >>> >>> Possible actions: >>> >>> 1: abort (with core dump, if enabled) >>> >>> 2: normal R exit >>> >>> 3: exit R without saving workspace >>> >>> 4: exit R saving workspace >>> >>> Selection: >>> >>> >>> >>> R and session info: >>> >>> >>> >>> R version 3.0.1 (2013-05-16) -- "Good Sport" >>> >>> Copyright (C) 2013 The R Foundation for Statistical Computing >>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> >>> >>> sessionInfo() >>> >>>> >>> >>> R version 3.0.1 (2013-05-16) >>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> >>> >>> locale: >>> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >>> [7] LC_PAPER=C LC_NAME=C >>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> >>> >>> attached base packages: >>> >>> [1] parallel stats graphics grDevices utils datasets methods >>> >>> [8] base >>> >>> >>> >>> other attached packages: >>> >>> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 >>> >>> >>> >>> loaded via a namespace (and not attached): >>> >>> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >>> >>> [4] zlibbioc_1.6.0 >>> >>> >>> >>> Can you help? >>> >>> >>> >>> Best, >>> >>> >>> >>> Ann >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> ______________________________**_________________ >>> >>> Bioconductor mailing list >>> >>> Bioconductor at r-project.org >>> >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>> >>> Search the archives: >>> >>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>> >>> >>> >>> >>> >> ______________________________**_________________ >>> >> Bioconductor mailing list >>> >> Bioconductor at r-project.org >>> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>> >> Search the archives: >>> >> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>> >> >>> > >>> > >>> > -- >>> > Computational Biology / Fred Hutchinson Cancer Research Center >>> > 1100 Fairview Ave. N. >>> > PO Box 19024 Seattle, WA 98109 >>> > >>> > Location: Arnold Building M1 B861 >>> > Phone: (206) 667-2793 >>> > >>> > >>> > ______________________________**_________________ >>> > Bioconductor mailing list >>> > Bioconductor at r-project.org >>> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>> > Search the archives: http://news.gmane.org/gmane.** >>> > science.biology.informatics.**conductor< >>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Matthew N McCall, PhD > 112 Arvine Heights > Rochester, NY 14611 > Cell: 202-222-5880 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
and I would start by checking the integrity of the files.... affyio (the engine is ReadAffy) throws errors like this when files are damaged... 2013/8/2 Henrik Bengtsson <hb at="" biostat.ucsf.edu="">: > Alright, my turn... > > http://aroma-project.org/ > > /Henrik > > > On Fri, Aug 2, 2013 at 2:57 PM, Matthew McCall <mccallm at="" gmail.com=""> wrote: >> Another option if your data comes from one of the more common >> affymetrix platforms (HGU133a, HGU133plus2, Mouse4302, HuGene, HuEx), >> you can read in and preprocess your data in subsets using frma and >> then combine it for further analysis. >> >> Best, >> Matt >> >> On Fri, Aug 2, 2013 at 4:59 PM, Adaikalavan Ramasamy >> <adaikalavan.ramasamy at="" gmail.com=""> wrote: >>> Also worth trying the standalone and free software called affy power tools >>> if its affy arrays. There is an option to choose sketch normalization. >>> >>> On 2 Aug 2013 21:34, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com=""> >>> wrote: >>> >>>> ReadAffy constructs a matrix which is probes x samples. If the array is >>>> big enough, perhaps several thousand samples is enough to push a matrix >>>> over the internal R limits. >>>> >>>> Which array is this? >>>> >>>> Kasper >>>> >>>> >>>> On Fri, Aug 2, 2013 at 4:17 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>>> >>>> > On 08/02/2013 11:00 AM, cstrato wrote: >>>> > >>>> >> Dear Ann, >>>> >> >>>> >> Several thousand CEL-files is quite a lot. Furthermore, you do not >>>> >> mention which >>>> >> array type are you using. >>>> >> >>>> >> In any case you could try to use package 'xps', which should be able to >>>> >> handle >>>> >> it. However, you should do your processing stepwise. >>>> >> >>>> > >>>> > also, perhaps justRMA is a more memory-efficient way to do standard >>>> > normalization. Probably there is a bug in ReadAffy, but it would be >>>> > difficult to track down without a more reproducible example. Martin >>>> > >>>> > >>>> > >>>> >> Best regards, >>>> >> Christian >>>> >> _._._._._._._._._._._._._._._.**_._._ >>>> >> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>>> >> V.i.e.n.n.a A.u.s.t.r.i.a >>>> >> e.m.a.i.l: cstrato at aon.at >>>> >> _._._._._._._._._._._._._._._.**_._._ >>>> >> >>>> >> >>>> >> >>>> >> On 8/1/13 5:33 PM, Loraine, Ann wrote: >>>> >> >>>> >>> Hello, >>>> >>> >>>> >>> I am trying to process several thousand CEL files using the ReadAffy >>>> >>> command. >>>> >>> >>>> >>> The machine has 96 Gb RAM. >>>> >>> >>>> >>> However I get this error: >>>> >>> >>>> >>> expr=ReadAffy(filenames=d.**uniq$cel,celfile.path='CEL',** >>>> >>>> sampleNames=d.uniq$gsm,**compress=T) >>>> >>>> >>>> >>>> >>>> >>> *** caught segfault *** >>>> >>> address 0x7fc79b4b1048, cause 'memory not mapped' >>>> >>> >>>> >>> Traceback: >>>> >>> 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, >>>> >>> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") >>>> >>> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData, >>>> >>> description = l$description, notes = notes, compress = compress, >>>> >>> rm.mask = >>>> >>> rm.mask, rm.outliers = rm.outliers, rm.extra = rm.extra, verbose = >>>> >>> verbose, sd = sd, cdfname = cdfname) >>>> >>> 3: ReadAffy(filenames = d.uniq$cel, celfile.path = "CEL", >>>> sampleNames = >>>> >>> d.uniq$gsm, compress = T) >>>> >>> >>>> >>> Possible actions: >>>> >>> 1: abort (with core dump, if enabled) >>>> >>> 2: normal R exit >>>> >>> 3: exit R without saving workspace >>>> >>> 4: exit R saving workspace >>>> >>> Selection: >>>> >>> >>>> >>> R and session info: >>>> >>> >>>> >>> R version 3.0.1 (2013-05-16) -- "Good Sport" >>>> >>> Copyright (C) 2013 The R Foundation for Statistical Computing >>>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>> >>>> >>> sessionInfo() >>>> >>>> >>>> >>> R version 3.0.1 (2013-05-16) >>>> >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>> >>>> >>> locale: >>>> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> >>> [7] LC_PAPER=C LC_NAME=C >>>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>> >>>> >>> attached base packages: >>>> >>> [1] parallel stats graphics grDevices utils datasets methods >>>> >>> [8] base >>>> >>> >>>> >>> other attached packages: >>>> >>> [1] affy_1.38.1 Biobase_2.20.1 BiocGenerics_0.6.0 >>>> >>> >>>> >>> loaded via a namespace (and not attached): >>>> >>> [1] affyio_1.28.0 BiocInstaller_1.10.3 preprocessCore_1.22.0 >>>> >>> [4] zlibbioc_1.6.0 >>>> >>> >>>> >>> Can you help? >>>> >>> >>>> >>> Best, >>>> >>> >>>> >>> Ann >>>> >>> >>>> >>> >>>> >>> [[alternative HTML version deleted]] >>>> >>> >>>> >>> ______________________________**_________________ >>>> >>> Bioconductor mailing list >>>> >>> Bioconductor at r-project.org >>>> >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> >>> Search the archives: >>>> >>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> >>> >>>> >>> >>>> >> ______________________________**_________________ >>>> >> Bioconductor mailing list >>>> >> Bioconductor at r-project.org >>>> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> >> Search the archives: >>>> >> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> >> >>>> > >>>> > >>>> > -- >>>> > Computational Biology / Fred Hutchinson Cancer Research Center >>>> > 1100 Fairview Ave. N. >>>> > PO Box 19024 Seattle, WA 98109 >>>> > >>>> > Location: Arnold Building M1 B861 >>>> > Phone: (206) 667-2793 >>>> > >>>> > >>>> > ______________________________**_________________ >>>> > Bioconductor mailing list >>>> > Bioconductor at r-project.org >>>> > https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> > Search the archives: http://news.gmane.org/gmane.** >>>> > science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> > >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> Matthew N McCall, PhD >> 112 Arvine Heights >> Rochester, NY 14611 >> Cell: 202-222-5880 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@brian-d-peyser-phd-6106
Last seen 8.9 years ago
On 8/1/13 5:33 PM, Loraine, Ann wrote: > Hello, > > I am trying to process several thousand CEL files using the ReadAffy command. > > The machine has 96 Gb RAM. > > However I get this error: > > > expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames= d.uniq$gsm,compress=T) > > *** caught segfault *** > address 0x7fc79b4b1048, cause 'memory not mapped' > I also have a problem loading many (3750) Affy hgu133plus2 arrays into an AffyBatch. I was able to run this with ~2900 arrays, but not since adding ~800 more. At right around 16 GiB allocated, I get a segfault like: *** caught segfault *** address 0x2aa6b6067048, cause 'memory not mapped' Traceback: 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") 2: read.affybatch(filenames = as.character(pdata$Filename)) I noticed this when trying to run justGCRMA() or justRMA(), which both threw the same error. The traceback pointed to read.affybatch() so I tried just doing that directly. I first checked to make sure each file could be read in a loop, and they all come in OK individually. However, if I try to read them all at once I keep getting errors right around 16 GiB allocated (to R). My laptop is Ubuntu Linux 12.04 with 32 GiB RAM, and I also tried this on a 256 GiB RAM machine with RHEL5. Both were running R version 3.0.1. On the Ubuntu machine, I was using affy v1.39.2, and on the RHEL5 machine it was affy v1.38.1. In both cases the segfault came at about 16 GiB allocated (PBS epilogue shows 15.41 GiB memory used when running on the 256 GiB machine via batch submission). I also ran via an interactive PBS session on the 256 GiB server and the same error happened. I had considered it could be a limit of the signed int indices for R vectors/arrays, but I thought that had changed as of R v3.0. Also, I thought that would give the error 'too many elements specified' rather than a 'memory not mapped' segfault. I've certainly allocated close to 64 GiB to R doing other things with these data, I'm just not sure if any individual vectors were that large. I know there are ways to get around this. For example, I ran fRMA on subsets (split it into 8 subsets) and then combined the expression sets. Of course trying to run fRMA on the whole set at once failed as well. The fRMA-summarized data just 'feel' a bit different though, and I've been working with many of these arrays for a while now. (I know 'feelings' aren't statistics, so please don't scorch me on that!) Also, I've seen the suggestions like aroma.* for large datasets. However, this seems like something that should be possible using the affy package given how cheap large memory systems are these days. I'm expecting a 0.5 TiB RAM workstation this fall! Also, if there is some kind of limitation in the implementation I think it's worth finding and helping get fixed. Any thoughts on whether there is a limitation in the affy package, in my gcc compiler, or something else? Would love for this to be able to use all my RAM. Below I included R output from one of my attempts. Thanks! Brian Peyser $ R --vanilla R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(affy) Loading required package: BiocGenerics Loading required package: parallel Attaching package: ?BiocGenerics? The following objects are masked from ?package:parallel?: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following object is masked from ?package:stats?: xtabs The following objects are masked from ?package:base?: anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table, tapply, union, unique, unlist Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. > data <- read.affybatch(filenames=list.files(pattern=".CEL$", ignore.case=TRUE)) *** caught segfault *** address 0x7f60734e7048, cause 'memory not mapped' Traceback: 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") 2: read.affybatch(filenames = as.character(pdata$Filename)) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: -- Brian D. Peyser PhD Special Assistant to the Associate Director Office of the Associate Director Developmental Therapeutics Program Division of Cancer Treatment and Diagnosis National Cancer Institute National Institutes of Health 301-524-5587 (mobile)
ADD COMMENT
0
Entering edit mode
Well, internally read_abatch is using allocMatrix() to actually allocate the main block of memory that will be used to store the probe intensities. However, there are a lot of places where "int" is used as the indexing variable. Probably if I had had better foresight when I wrote this code 10 years ago, I'd have used something a bit more specific (eg int64_t). I'm guessing it is one of these sorts of things that is causing the crash. I'll try to get around to refactoring the code at some point. If you'd like you could send me the gdb backtrace at the point of the segfault and I could investigate further. Best, Ben > On 8/1/13 5:33 PM, Loraine, Ann wrote: >> Hello, >> >> I am trying to process several thousand CEL files using the ReadAffy >> command. >> >> The machine has 96 Gb RAM. >> >> However I get this error: >> >> > expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames =d.uniq$gsm,compress=T) >> >> *** caught segfault *** >> address 0x7fc79b4b1048, cause 'memory not mapped' >> > > I also have a problem loading many (3750) Affy hgu133plus2 arrays into > an AffyBatch. I was able to run this with ~2900 arrays, but not since > adding ~800 more. At right around 16 GiB allocated, I get a segfault > like: > > *** caught segfault *** > address 0x2aa6b6067048, cause 'memory not mapped' > > Traceback: > 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, > ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > 2: read.affybatch(filenames = as.character(pdata$Filename)) > > I noticed this when trying to run justGCRMA() or justRMA(), which both > threw the same error. The traceback pointed to read.affybatch() so I > tried just doing that directly. > > I first checked to make sure each file could be read in a loop, and they > all come in OK individually. However, if I try to read them all at once > I keep getting errors right around 16 GiB allocated (to R). > > My laptop is Ubuntu Linux 12.04 with 32 GiB RAM, and I also tried this > on a 256 GiB RAM machine with RHEL5. Both were running R version 3.0.1. > On the Ubuntu machine, I was using affy v1.39.2, and on the RHEL5 > machine it was affy v1.38.1. > > In both cases the segfault came at about 16 GiB allocated (PBS epilogue > shows 15.41 GiB memory used when running on the 256 GiB machine via > batch submission). I also ran via an interactive PBS session on the 256 > GiB server and the same error happened. > > I had considered it could be a limit of the signed int indices for R > vectors/arrays, but I thought that had changed as of R v3.0. Also, I > thought that would give the error 'too many elements specified' rather > than a 'memory not mapped' segfault. I've certainly allocated close to > 64 GiB to R doing other things with these data, I'm just not sure if any > individual vectors were that large. > > I know there are ways to get around this. For example, I ran fRMA on > subsets (split it into 8 subsets) and then combined the expression sets. > Of course trying to run fRMA on the whole set at once failed as well. > The fRMA-summarized data just 'feel' a bit different though, and I've > been working with many of these arrays for a while now. (I know > 'feelings' aren't statistics, so please don't scorch me on that!) Also, > I've seen the suggestions like aroma.* for large datasets. > > However, this seems like something that should be possible using the > affy package given how cheap large memory systems are these days. I'm > expecting a 0.5 TiB RAM workstation this fall! Also, if there is some > kind of limitation in the implementation I think it's worth finding and > helping get fixed. Any thoughts on whether there is a limitation in the > affy package, in my gcc compiler, or something else? Would love for this > to be able to use all my RAM. > > Below I included R output from one of my attempts. > > Thanks! > > Brian Peyser > > > $ R --vanilla > R version 3.0.1 (2013-05-16) -- "Good Sport" > Copyright (C) 2013 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > >> library(affy) > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: ???BiocGenerics??? > > The following objects are masked from ???package:parallel???: > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > clusterExport, clusterMap, parApply, parCapply, parLapply, > parLapplyLB, parRapply, parSapply, parSapplyLB > > The following object is masked from ???package:stats???: > > xtabs > > The following objects are masked from ???package:base???: > > anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, > duplicated, eval, Filter, Find, get, intersect, lapply, Map, > mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, > Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, > sort, table, tapply, union, unique, unlist > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > >> data <- read.affybatch(filenames=list.files(pattern=".CEL$", >> ignore.case=TRUE)) > *** caught segfault *** > address 0x7f60734e7048, cause 'memory not mapped' > > Traceback: > 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, > ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > 2: read.affybatch(filenames = as.character(pdata$Filename)) > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > Selection: > > -- > Brian D. Peyser PhD > Special Assistant to the Associate Director > Office of the Associate Director > Developmental Therapeutics Program > Division of Cancer Treatment and Diagnosis > National Cancer Institute > National Institutes of Health > 301-524-5587 (mobile) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Wed, 2013-08-21 at 16:21 -0700, Ben Bolstad wrote: > Well, internally read_abatch is using allocMatrix() to actually allocate > the main block of memory that will be used to store the probe intensities. > However, there are a lot of places where "int" is used as the indexing > variable. Probably if I had had better foresight when I wrote this code 10 > years ago, I'd have used something a bit more specific (eg int64_t). I'm > guessing it is one of these sorts of things that is causing the crash. > I'll try to get around to refactoring the code at some point. > > If you'd like you could send me the gdb backtrace at the point of the > segfault and I could investigate further. > > Best, > > Ben > Thanks for the info, Ben! With a little help from Google I ran R under gdb: $ R -d gdb (gdb) run --vanilla > library(affy) > data <- read.affybatch(filenames=list.files(pattern="\\.CEL$", ignore.case=TRUE)) Program received signal SIGSEGV, Segmentation fault. 0x00007ffff2849c46 in read_binarycel_file_intensities (filename=<optimized out="">, intensity=0x7ff67b3fc048, chip_num=1848, rows=<optimized out="">, cols=<optimized out="">, chip_dim_rows=<optimized out="">) at read_abatch.c:2862 2862 read_abatch.c: No such file or directory. (gdb) set logging on Copying output to gdb.txt. (gdb) bt #0 0x00007ffff2849c46 in read_binarycel_file_intensities (filename=<optimized out="">, intensity=0x7ff67b3fc048, chip_num=1848, rows=<optimized out="">, cols=<optimized out="">, chip_dim_rows=<optimized out="">) at read_abatch.c:2862 #1 0x00007ffff285062c in read_abatch (filenames=0x298eb40, rm_mask=0x1ffcb68, rm_outliers=0x1ffcb98, rm_extra=0x1ffcbc8, ref_cdfName=<optimized out="">, ref_dim=<optimized out="">, verbose=0x1ffcbf8) at read_abatch.c:3959 #2 0x00007ffff79310c2 in do_dotcall (call=0x1d88d00, op=<optimized out="">, args=<optimized out="">, env=0x1d793b0) at dotcode.c:600 #3 0x00007ffff797146e in Rf_eval (e=0x1d88d00, rho=0x1d793b0) at eval.c:635 #4 0x00007ffff7972da0 in do_set (call=0x1d88be8, op=0x610458, args=0x1d88c20, rho=0x1d793b0) at eval.c:1871 #5 0x00007ffff7971277 in Rf_eval (e=0x1d88be8, rho=0x1d793b0) at eval.c:607 #6 0x00007ffff7972f90 in do_begin (call=0x1dc8c40, op=0x610260, args=0x1d88bb0, rho=0x1d793b0) at eval.c:1557 #7 0x00007ffff7971277 in Rf_eval (e=0x1dc8c40, rho=0x1d793b0) at eval.c:607 #8 0x00007ffff797448d in Rf_applyClosure (call=0x1dcc0a8, op=0x1dcb648, arglist=<optimized out="">, rho=0x6348b8, suppliedenv=<optimized out="">) at eval.c:1003 #9 0x00007ffff7970fcf in Rf_eval (e=0x1dcc0a8, rho=0x6348b8) at eval.c:654 #10 0x00007ffff7972da0 in do_set (call=0x1dcb370, op=0x610458, args=0x1dcc118, rho=0x6348b8) at eval.c:1871 #11 0x00007ffff7971277 in Rf_eval (e=0x1dcb370, rho=0x6348b8) at eval.c:607 #12 0x00007ffff799911d in Rf_ReplIteration (rho=0x6348b8, savestack=<optimized out="">, browselevel=<optimized out="">, state=0x7fffffffd340) at main.c:258 #13 0x00007ffff79993c0 in R_ReplConsole (rho=0x6348b8, savestack=0, browselevel=0) at main.c:307 #14 0x00007ffff7999450 in run_Rmainloop () at main.c:986 #15 0x000000000040078b in main (ac=<optimized out="">, av=<optimized out="">) at Rmain.c:32 #16 0x00007ffff72dc76d in __libc_start_main () from /lib/x86_64-linux- gnu/libc.so.6 #17 0x00000000004007bd in _start () (gdb) (gdb log attached as gdb.txt) Hope that helps! -Brian -- Brian D. Peyser PhD Special Assistant to the Associate Director Office of the Associate Director Developmental Therapeutics Program Division of Cancer Treatment and Diagnosis National Cancer Institute National Institutes of Health 301-524-5587 (mobile) -------------- next part -------------- #0 0x00007ffff2849c46 in read_binarycel_file_intensities (filename=<optimized out="">, intensity=0x7ff67b3fc048, chip_num=1848, rows=<optimized out="">, cols=<optimized out="">, chip_dim_rows=<optimized out="">) at read_abatch.c:2862 #1 0x00007ffff285062c in read_abatch (filenames=0x298eb40, rm_mask=0x1ffcb68, rm_outliers=0x1ffcb98, rm_extra=0x1ffcbc8, ref_cdfName=<optimized out="">, ref_dim=<optimized out="">, verbose=0x1ffcbf8) at read_abatch.c:3959 #2 0x00007ffff79310c2 in do_dotcall (call=0x1d88d00, op=<optimized out="">, args=<optimized out="">, env=0x1d793b0) at dotcode.c:600 #3 0x00007ffff797146e in Rf_eval (e=0x1d88d00, rho=0x1d793b0) at eval.c:635 #4 0x00007ffff7972da0 in do_set (call=0x1d88be8, op=0x610458, args=0x1d88c20, rho=0x1d793b0) at eval.c:1871 #5 0x00007ffff7971277 in Rf_eval (e=0x1d88be8, rho=0x1d793b0) at eval.c:607 #6 0x00007ffff7972f90 in do_begin (call=0x1dc8c40, op=0x610260, args=0x1d88bb0, rho=0x1d793b0) at eval.c:1557 #7 0x00007ffff7971277 in Rf_eval (e=0x1dc8c40, rho=0x1d793b0) at eval.c:607 #8 0x00007ffff797448d in Rf_applyClosure (call=0x1dcc0a8, op=0x1dcb648, arglist=<optimized out="">, rho=0x6348b8, suppliedenv=<optimized out="">) at eval.c:1003 #9 0x00007ffff7970fcf in Rf_eval (e=0x1dcc0a8, rho=0x6348b8) at eval.c:654 #10 0x00007ffff7972da0 in do_set (call=0x1dcb370, op=0x610458, args=0x1dcc118, rho=0x6348b8) at eval.c:1871 #11 0x00007ffff7971277 in Rf_eval (e=0x1dcb370, rho=0x6348b8) at eval.c:607 #12 0x00007ffff799911d in Rf_ReplIteration (rho=0x6348b8, savestack=<optimized out="">, browselevel=<optimized out="">, state=0x7fffffffd340) at main.c:258 #13 0x00007ffff79993c0 in R_ReplConsole (rho=0x6348b8, savestack=0, browselevel=0) at main.c:307 #14 0x00007ffff7999450 in run_Rmainloop () at main.c:986 #15 0x000000000040078b in main (ac=<optimized out="">, av=<optimized out="">) at Rmain.c:32 #16 0x00007ffff72dc76d in __libc_start_main () from /lib/x86_64-linux- gnu/libc.so.6 #17 0x00000000004007bd in _start ()
ADD REPLY
0
Entering edit mode
On Thu Aug 22 2013 6:18 PM, I wrote: > I had considered it could be a limit of the signed int indices for R > vectors/arrays, but I thought that had changed as of R v3.0. Also, I > thought that would give the error 'too many elements specified' rather > than a 'memory not mapped' segfault. I've certainly allocated close to > 64 GiB to R doing other things with these data, I'm just not sure if any > individual vectors were that large. I just ran: $ R R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > temp = array(rnorm(3750*604258), c(3750, 604258)) > and R was allocated 30.0 GiB, and did not crash. From hgu133plus2probe info, there are 604258 probes (_not_ probesets) on each hgu133plus2 Genechip, and I have 3750 chips. Therefore, I can generate a 3750-by-604258 array of random data without a segfault, and R shoots right past 16 GiB allocated with no hiccups. -Brian -- Brian D. Peyser PhD Special Assistant to the Associate Director Office of the Associate Director Developmental Therapeutics Program Division of Cancer Treatment and Diagnosis National Cancer Institute National Institutes of Health 301-524-5587 (mobile)
ADD REPLY
0
Entering edit mode
Dear Brian, As I have already mentioned in the former case, package xps is able to handle this amount of arrays. (Quite some time ago a user did use xps to process about 23,000 hgu133plus2 arrays on his Mac, and memory consumption was only 4 GB RAM.) Best regards, Christian On 8/22/13 12:18 AM, Brian D. Peyser PhD wrote: > On 8/1/13 5:33 PM, Loraine, Ann wrote: >> Hello, >> >> I am trying to process several thousand CEL files using the ReadAffy command. >> >> The machine has 96 Gb RAM. >> >> However I get this error: >> >>> expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames= d.uniq$gsm,compress=T) >> >> *** caught segfault *** >> address 0x7fc79b4b1048, cause 'memory not mapped' >> > > I also have a problem loading many (3750) Affy hgu133plus2 arrays into > an AffyBatch. I was able to run this with ~2900 arrays, but not since > adding ~800 more. At right around 16 GiB allocated, I get a segfault > like: > > *** caught segfault *** > address 0x2aa6b6067048, cause 'memory not mapped' > > Traceback: > 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > 2: read.affybatch(filenames = as.character(pdata$Filename)) > > I noticed this when trying to run justGCRMA() or justRMA(), which both > threw the same error. The traceback pointed to read.affybatch() so I > tried just doing that directly. > > I first checked to make sure each file could be read in a loop, and they > all come in OK individually. However, if I try to read them all at once > I keep getting errors right around 16 GiB allocated (to R). > > My laptop is Ubuntu Linux 12.04 with 32 GiB RAM, and I also tried this > on a 256 GiB RAM machine with RHEL5. Both were running R version 3.0.1. > On the Ubuntu machine, I was using affy v1.39.2, and on the RHEL5 > machine it was affy v1.38.1. > > In both cases the segfault came at about 16 GiB allocated (PBS epilogue > shows 15.41 GiB memory used when running on the 256 GiB machine via > batch submission). I also ran via an interactive PBS session on the 256 > GiB server and the same error happened. > > I had considered it could be a limit of the signed int indices for R > vectors/arrays, but I thought that had changed as of R v3.0. Also, I > thought that would give the error 'too many elements specified' rather > than a 'memory not mapped' segfault. I've certainly allocated close to > 64 GiB to R doing other things with these data, I'm just not sure if any > individual vectors were that large. > > I know there are ways to get around this. For example, I ran fRMA on > subsets (split it into 8 subsets) and then combined the expression sets. > Of course trying to run fRMA on the whole set at once failed as well. > The fRMA-summarized data just 'feel' a bit different though, and I've > been working with many of these arrays for a while now. (I know > 'feelings' aren't statistics, so please don't scorch me on that!) Also, > I've seen the suggestions like aroma.* for large datasets. > > However, this seems like something that should be possible using the > affy package given how cheap large memory systems are these days. I'm > expecting a 0.5 TiB RAM workstation this fall! Also, if there is some > kind of limitation in the implementation I think it's worth finding and > helping get fixed. Any thoughts on whether there is a limitation in the > affy package, in my gcc compiler, or something else? Would love for this > to be able to use all my RAM. > > Below I included R output from one of my attempts. > > Thanks! > > Brian Peyser > > > $ R --vanilla > R version 3.0.1 (2013-05-16) -- "Good Sport" > Copyright (C) 2013 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > >> library(affy) > Loading required package: BiocGenerics > Loading required package: parallel > > Attaching package: ?BiocGenerics? > > The following objects are masked from ?package:parallel?: > > clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, > clusterExport, clusterMap, parApply, parCapply, parLapply, > parLapplyLB, parRapply, parSapply, parSapplyLB > > The following object is masked from ?package:stats?: > > xtabs > > The following objects are masked from ?package:base?: > > anyDuplicated, append, as.data.frame, as.vector, cbind, colnames, > duplicated, eval, Filter, Find, get, intersect, lapply, Map, > mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, > Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff, > sort, table, tapply, union, unique, unlist > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > >> data <- read.affybatch(filenames=list.files(pattern=".CEL$", ignore.case=TRUE)) > *** caught segfault *** > address 0x7f60734e7048, cause 'memory not mapped' > > Traceback: > 1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio") > 2: read.affybatch(filenames = as.character(pdata$Filename)) > > Possible actions: > 1: abort (with core dump, if enabled) > 2: normal R exit > 3: exit R without saving workspace > 4: exit R saving workspace > Selection: >
ADD REPLY
0
Entering edit mode
On Thu, 2013-08-22 at 14:53 +0200, cstrato wrote: > Dear Brian, > > As I have already mentioned in the former case, package xps is able to > handle this amount of arrays. > (Quite some time ago a user did use xps to process about 23,000 > hgu133plus2 arrays on his Mac, and memory consumption was only 4 GB RAM.) > > Best regards, > Christian > Thanks Christian, I did take a look at xps; it seems to be something that will certainly improve large scale analyses in R. In fact I may recommend investigating the ROOT interface to one of my personal friends who works in health informatics with large-scale (terabyte-sized, not petabyte) data. I think that was the one place SAS could still beat R, but that may be ending! In my case, I see a number of ways to accomplish the analysis, and did so with fRMA, but read.affybatch() seems to be failing when it could succeed. Thanks again, Brian -- Brian D. Peyser PhD Special Assistant to the Associate Director Office of the Associate Director Developmental Therapeutics Program Division of Cancer Treatment and Diagnosis National Cancer Institute National Institutes of Health 301-524-5587 (mobile) [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Brian, I'm curious what you ended up doing with fRMA. Did you just use the default implementation? If you want something closer to RMA, you can obtain RMA-like expression estimates by creating your own custom frma vectors using a (in your case potentially fairly large) subset of the data. You can even do this several times to get a sense of how much the estimates depend on the subset used to create the frma vectors. This is implemented in the frmaTools package. There is also a paper that describes this: McCall MN and Irizarry RA (2011). Thawing Frozen Robust Multi-array Analysis (fRMA), BMC Bioinformatics, 12:369. Best, Matt On Thu, Aug 22, 2013 at 9:49 PM, Brian D. Peyser PhD <brian.peyser at="" nih.gov=""> wrote: > On Thu, 2013-08-22 at 14:53 +0200, cstrato wrote: > >> Dear Brian, >> >> As I have already mentioned in the former case, package xps is able to >> handle this amount of arrays. >> (Quite some time ago a user did use xps to process about 23,000 >> hgu133plus2 arrays on his Mac, and memory consumption was only 4 GB RAM.) >> >> Best regards, >> Christian >> > > Thanks Christian, > > I did take a look at xps; it seems to be something that will certainly > improve large scale analyses in R. In fact I may recommend investigating > the ROOT interface to one of my personal friends who works in health > informatics with large-scale (terabyte-sized, not petabyte) data. I > think that was the one place SAS could still beat R, but that may be > ending! > > In my case, I see a number of ways to accomplish the analysis, and did > so with fRMA, but read.affybatch() seems to be failing when it could > succeed. > > Thanks again, > Brian > -- > Brian D. Peyser PhD > Special Assistant to the Associate Director > Office of the Associate Director > Developmental Therapeutics Program > Division of Cancer Treatment and Diagnosis > National Cancer Institute > National Institutes of Health > 301-524-5587 (mobile) > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Matthew N McCall, PhD 112 Arvine Heights Rochester, NY 14611 Cell: 202-222-5880
ADD REPLY
0
Entering edit mode
On Thu, 2013-08-22 at 22:15 -0400, Matthew McCall wrote: > Brian, > > I'm curious what you ended up doing with fRMA. Did you just use the > default implementation? > > If you want something closer to RMA, you can obtain RMA-like > expression estimates by creating your own custom frma vectors using a > (in your case potentially fairly large) subset of the data. You can > even do this several times to get a sense of how much the estimates > depend on the subset used to create the frma vectors. This is > implemented in the frmaTools package. There is also a paper that > describes this: > McCall MN and Irizarry RA (2011). Thawing Frozen Robust Multi-array > Analysis (fRMA), BMC Bioinformatics, 12:369. > > Best, > Matt > Hi Matt, Yes, I did run with the default fRMA vectors, and I have considered creating custom vectors. I just haven't had the time yet to go back and work that out. I'm sure I'll have to try a number of subsets, both randomly selected and systematically compared. Rafa Irizarry is the one who suggested I try fRMA in the first place (I went to him with the RMA/GCRMA segfaults before coming here), and I have seen that citation. Also, I don't have a reason to dislike the fRMA results, other than they "felt" slightly different, especially in some specific comparisons I had already made between certain subsets. The reason I was concerned about the segfault was because I never seemed to approach my total physical RAM, so that made me think there had to be some kind of limitation or bug in the code. My goal here is more to help improve the package than to get my data analyzed--there are several methods I can use to accomplish the analysis, and I've already started with fRMA. Thanks, Brian -- Brian D. Peyser PhD Special Assistant to the Associate Director Office of the Associate Director Developmental Therapeutics Program Division of Cancer Treatment and Diagnosis National Cancer Institute National Institutes of Health 301-524-5587 (mobile)
ADD REPLY

Login before adding your answer.

Traffic: 540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6