farms package fails on estrogen data
2
0
Entering edit mode
@javier-perez-florido-3121
Last seen 6.8 years ago
Dear list, I've tried to use l.farms (farms package) http://www.bioinf.jku.at/software/farms/farms.html for preprocessing the estrogen data set available at http://www.bioconductor.org/packages/release/data/experiment/html/estr ogen.html When trying to preprocess using l.farms, the following error comes up: background correction: none normalization: loess PM/MM correction : pmonly expression values: farms background correcting...done. normalizing...Error: NA/NaN/Inf in call to external function (arg 1) q.farms works for the same data set and l.farms works for the Dilution data set, so, I don't know what is going on here. Any suggestions? Javier sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 loaded via a namespace (and not attached): [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 splines_2.10.1 [5] tools_2.10.1
Preprocessing Preprocessing • 1.4k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: > Dear list, > I've tried to use l.farms (farms package) > http://www.bioinf.jku.at/software/farms/farms.html > for preprocessing the estrogen data set available at > http://www.bioconductor.org/packages/release/data/experiment/html/es trogen.html > > > When trying to preprocess using l.farms, the following error comes up: > > background correction: none > normalization: loess > PM/MM correction : pmonly > expression values: farms > background correcting...done. > normalizing...Error: NA/NaN/Inf in call to external function (arg 1) > > q.farms works for the same data set and l.farms works for the Dilution > data set, so, I don't know what is going on here. > > Any suggestions? Hi Javier -- If I library(farms) library(estrogen) setwd(system.file("extdata", package="estrogen")) ab = ReadAffy() l.farms(ab) I see the error you report. It is because > sum(intensity(ab) == 0) [1] 5 so that log transformation generates -Inf and then the error. It doesn't seem too bad to intensity(ab) = 1 + intensity(ab) [I don't know enough about the inner workings of affyio, but I'm surprised that I sometimes see > range(intensity(ReadAffy())) [1] 0 46131 > range(intensity(ReadAffy)) [1] 1.563578e-316 4.613100e+04 suggesting either initialized memory or unanticipated 'empty' intensity grid coordinates)] Martin > Javier > > sessionInfo() > R version 2.10.1 (2009-12-14) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C > [5] LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 > [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 > [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 splines_2.10.1 > [5] tools_2.10.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: > Dear list, > I've tried to use l.farms (farms package) > http://www.bioinf.jku.at/software/farms/farms.html > for preprocessing the estrogen data set available at > http://www.bioconductor.org/packages/release/data/experiment/html/es trogen.html > > > When trying to preprocess using l.farms, the following error comes up: > > background correction: none > normalization: loess > PM/MM correction : pmonly > expression values: farms > background correcting...done. > normalizing...Error: NA/NaN/Inf in call to external function (arg 1) > > q.farms works for the same data set and l.farms works for the Dilution > data set, so, I don't know what is going on here. > > Any suggestions? Hi Javier -- If I library(farms) library(estrogen) setwd(system.file("extdata", package="estrogen")) ab = ReadAffy() l.farms(ab) I see the error you report. It is because > sum(intensity(ab) == 0) [1] 5 so that log transformation generates -Inf and then the error. It doesn't seem too bad to intensity(ab) = 1 + intensity(ab) [I don't know enough about the inner workings of affyio, but I'm surprised that I sometimes see > range(intensity(ReadAffy())) [1] 0 46131 > range(intensity(ReadAffy)) [1] 1.563578e-316 4.613100e+04 suggesting either initialized memory or unanticipated 'empty' intensity grid coordinates)] Martin > Javier > > sessionInfo() > R version 2.10.1 (2009-12-14) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C > [5] LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 > [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 > [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 splines_2.10.1 > [5] tools_2.10.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
ReadAffy uses allocMatrix() to allocate the memory into which it intends to place probe intensities read from the file. Based on information in the CEL file header it determines grid dimensions to determine how many probe intensities are expected. Exact parsing depends on the CEL file format of which there are a few, but generally speaking it is expected that there is an intensity to be read for every location in the grid. Zeros can arise in truncated and corrupted CEL files. But many of those types of situations should be detected and an error thrown. There may be other mechanisms for generation of such probe intensities which make them valid. > On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: >> Dear list, >> I've tried to use l.farms (farms package) >> http://www.bioinf.jku.at/software/farms/farms.html >> for preprocessing the estrogen data set available at >> http://www.bioconductor.org/packages/release/data/experiment/html/e strogen.html >> >> >> When trying to preprocess using l.farms, the following error comes up: >> >> background correction: none >> normalization: loess >> PM/MM correction : pmonly >> expression values: farms >> background correcting...done. >> normalizing...Error: NA/NaN/Inf in call to external function (arg 1) >> >> q.farms works for the same data set and l.farms works for the Dilution >> data set, so, I don't know what is going on here. >> >> Any suggestions? > > Hi Javier -- > > If I > > library(farms) > library(estrogen) > setwd(system.file("extdata", package="estrogen")) > ab = ReadAffy() > l.farms(ab) > > I see the error you report. It is because > >> sum(intensity(ab) == 0) > [1] 5 > > so that log transformation generates -Inf and then the error. It doesn't > seem too bad to > > intensity(ab) = 1 + intensity(ab) > > [I don't know enough about the inner workings of affyio, but I'm > surprised that I sometimes see > >> range(intensity(ReadAffy())) > [1] 0 46131 > >> range(intensity(ReadAffy)) > [1] 1.563578e-316 4.613100e+04 > > suggesting either initialized memory or unanticipated 'empty' intensity > grid coordinates)] > > Martin > > >> Javier >> >> sessionInfo() >> R version 2.10.1 (2009-12-14) >> i386-pc-mingw32 >> >> locale: >> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >> [5] LC_TIME=Spanish_Spain.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 >> [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 >> [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 >> splines_2.10.1 >> [5] tools_2.10.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On 3/16/10 2:37 PM, bmb at bmbolstad.com wrote: > ReadAffy uses allocMatrix() to allocate the memory into which it intends > to place probe intensities read from the file. Not sure there is an issue here or not based on your comments, but just so we are all on the same page, allocMatrix allocates but does not initialize memory for a matrix. + seth Based on information in the > CEL file header it determines grid dimensions to determine how many probe > intensities are expected. Exact parsing depends on the CEL file format of > which there are a few, but generally speaking it is expected that there is > an intensity to be read for every location in the grid. Zeros can arise in > truncated and corrupted CEL files. But many of those types of situations > should be detected and an error thrown. There may be other mechanisms for > generation of such probe intensities which make them valid. > > > >> On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: >>> Dear list, >>> I've tried to use l.farms (farms package) >>> http://www.bioinf.jku.at/software/farms/farms.html >>> for preprocessing the estrogen data set available at >>> http://www.bioconductor.org/packages/release/data/experiment/html/ estrogen.html >>> >>> >>> When trying to preprocess using l.farms, the following error comes up: >>> >>> background correction: none >>> normalization: loess >>> PM/MM correction : pmonly >>> expression values: farms >>> background correcting...done. >>> normalizing...Error: NA/NaN/Inf in call to external function (arg 1) >>> >>> q.farms works for the same data set and l.farms works for the Dilution >>> data set, so, I don't know what is going on here. >>> >>> Any suggestions? >> >> Hi Javier -- >> >> If I >> >> library(farms) >> library(estrogen) >> setwd(system.file("extdata", package="estrogen")) >> ab = ReadAffy() >> l.farms(ab) >> >> I see the error you report. It is because >> >>> sum(intensity(ab) == 0) >> [1] 5 >> >> so that log transformation generates -Inf and then the error. It doesn't >> seem too bad to >> >> intensity(ab) = 1 + intensity(ab) >> >> [I don't know enough about the inner workings of affyio, but I'm >> surprised that I sometimes see >> >>> range(intensity(ReadAffy())) >> [1] 0 46131 >> >>> range(intensity(ReadAffy)) >> [1] 1.563578e-316 4.613100e+04 >> >> suggesting either initialized memory or unanticipated 'empty' intensity >> grid coordinates)] >> >> Martin >> >> >>> Javier >>> >>> sessionInfo() >>> R version 2.10.1 (2009-12-14) >>> i386-pc-mingw32 >>> >>> locale: >>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>> [5] LC_TIME=Spanish_Spain.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 >>> [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 >>> [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 >>> splines_2.10.1 >>> [5] tools_2.10.1 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Seth Falcon Bioconductor Core Team | FHCRC
ADD REPLY
0
Entering edit mode
Yes I am aware that allocMatrix does not initialize. However, it is expected that a data point will be read from the CEL file for every location in this matrix. In the case of the GCOS/XDA and Calvin/CommandConsole format CEL file the order in which things are in the file tells you where to place the data in the matrix. This should be such that for a non-corrupt file every entry in the matrix would get assigned an intensity value. In the case of text format CEL file there are actually x and y coordinates given to each probe intensity. No specific ordering is specified in the file documentation, so these x and y coordinates are used to determine where the relevant intensity value should be placed in the matrix. not much checking is being done on the validity of these currently. Note of course that text format CEL files are starting to become more rare. Now it turns out that in estrogen there are 9 CEL files. All 9 are text format. > library(affy) > setwd(system.file("extdata", package="estrogen")) > list.celfiles() [1] "bad.cel" "high10-1.cel" "high10-2.cel" "high48-1.cel" "high48-2.cel" [6] "low10-1.cel" "low10-2.cel" "low48-1.cel" "low48-2.cel" > ab = ReadAffy() > range(intensity(ab)) [1] 0 46131 > apply(intensity(ab),2,range) bad.cel high10-1.cel high10-2.cel high48-1.cel high48-2.cel low10-1.cel [1,] 0 32.5 49.5 420.5 425 0 [2,] 46131 18308.0 45794.0 46111.0 46108 16457 low10-2.cel low48-1.cel low48-2.cel [1,] 0 360.5 395 [2,] 20763 46102.0 46115 > which(intensity(ab)==0) [1] 140224 2143523 2202236 2337976 2841090 > 140224%%640 [1] 64 > 140224%/%640 [1] 219 because things are zero based want to look for entry at 63 219 in bad.cel Low and behold: 61 219 129.0 25.0 25 62 219 84.0 14.1 25 6219 133.3 32.4 20 64 219 4241.0 634.0 25 65 219 3215.0 565.7 25 so corrupt x/y location. What about in "low10-1.cel" > which(intensity(ab)[,6] == 0)%%640 95523 154236 289976 163 636 56 > which(intensity(ab)[,6] == 0)%/%640 95523 154236 289976 149 240 453 And again 161 149 58.8 7.4 20 162 049 100.0 11.4 25 163 149 156.5 31.9 20 and again 634 240 63.3 7.5 20 63240 57.3 8.8 20 636 240 70.0 10.3 25 and here too 54 453 132.0 21.5 25 55 053 94.0 11.5 25 56 453 145.8 19.5 16 In theory at least some these cases should have been caught by the parser, since there is meant to be a check that ensures that no of those coordinates are larger the the chip dimensions (which 6219 and 63240 most certainly are). The parser should most certainly be throwing its hands up in error at this point. In any case I would think it bad form to be distributing corrupt CEL files as part of a package. Going into debugger land and looking at that first cel gdb) p buffer $6 = " 63\b219\t133.3\t32.4\t 20\n\000 .......... and it turns out that it takes x location as 63, y location as 133 and the intensity value as 32.4 (so in other words it is putting an incorrect intensity in the wrong place). Now what the heck a "\b" (aka ^H terminal code) is doing here I can't fathom, but you live with what you get. Now I could enforce a check that there are five distinct entries (as separated by tabs) on each row (which is something I avoided in the past for speed since typically npixels and stddev are ignored) and that would catch two of those cases. There is not much that can be done about the catching the other two unless one were to enforce that x/y locations are sequential. The file documentation used to implement the parser does not specify that this has to be the case for text format files. On Tue, 2010-03-16 at 14:55 -0700, Seth Falcon wrote: > On 3/16/10 2:37 PM, bmb at bmbolstad.com wrote: > > ReadAffy uses allocMatrix() to allocate the memory into which it intends > > to place probe intensities read from the file. > > Not sure there is an issue here or not based on your comments, but just > so we are all on the same page, allocMatrix allocates but does not > initialize memory for a matrix. > > + seth > > > Based on information in the > > CEL file header it determines grid dimensions to determine how many probe > > intensities are expected. Exact parsing depends on the CEL file format of > > which there are a few, but generally speaking it is expected that there is > > an intensity to be read for every location in the grid. Zeros can arise in > > truncated and corrupted CEL files. But many of those types of situations > > should be detected and an error thrown. There may be other mechanisms for > > generation of such probe intensities which make them valid. > > > > > > > >> On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: > >>> Dear list, > >>> I've tried to use l.farms (farms package) > >>> http://www.bioinf.jku.at/software/farms/farms.html > >>> for preprocessing the estrogen data set available at > >>> http://www.bioconductor.org/packages/release/data/experiment/htm l/estrogen.html > >>> > >>> > >>> When trying to preprocess using l.farms, the following error comes up: > >>> > >>> background correction: none > >>> normalization: loess > >>> PM/MM correction : pmonly > >>> expression values: farms > >>> background correcting...done. > >>> normalizing...Error: NA/NaN/Inf in call to external function (arg 1) > >>> > >>> q.farms works for the same data set and l.farms works for the Dilution > >>> data set, so, I don't know what is going on here. > >>> > >>> Any suggestions? > >> > >> Hi Javier -- > >> > >> If I > >> > >> library(farms) > >> library(estrogen) > >> setwd(system.file("extdata", package="estrogen")) > >> ab = ReadAffy() > >> l.farms(ab) > >> > >> I see the error you report. It is because > >> > >>> sum(intensity(ab) == 0) > >> [1] 5 > >> > >> so that log transformation generates -Inf and then the error. It doesn't > >> seem too bad to > >> > >> intensity(ab) = 1 + intensity(ab) > >> > >> [I don't know enough about the inner workings of affyio, but I'm > >> surprised that I sometimes see > >> > >>> range(intensity(ReadAffy())) > >> [1] 0 46131 > >> > >>> range(intensity(ReadAffy)) > >> [1] 1.563578e-316 4.613100e+04 > >> > >> suggesting either initialized memory or unanticipated 'empty' intensity > >> grid coordinates)] > >> > >> Martin > >> > >> > >>> Javier > >>> > >>> sessionInfo() > >>> R version 2.10.1 (2009-12-14) > >>> i386-pc-mingw32 > >>> > >>> locale: > >>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > >>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C > >>> [5] LC_TIME=Spanish_Spain.1252 > >>> > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets methods base > >>> > >>> other attached packages: > >>> [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 > >>> [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 > >>> [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 > >>> splines_2.10.1 > >>> [5] tools_2.10.1 > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> -- > >> Martin Morgan > >> Computational Biology / Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N. > >> PO Box 19024 Seattle, WA 98109 > >> > >> Location: Arnold Building M1 B861 > >> Phone: (206) 667-2793 > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Seth Falcon > Bioconductor Core Team | FHCRC > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Ben -- Yes the estrogen files seem to have some minor indiscretions; these seem to be quite 'ancient' (e.g., in the 1.8 experiment data repository) and perhaps a clean version is not recoverable (this is still being investigated). I guess a relatively quick way to 'validate' the files on input is to set intensity values to NA_REAL after allocMatrix and then confirm !anyis.na()) on exit. Martin On 03/16/2010 05:51 PM, Ben Bolstad wrote: > Yes I am aware that allocMatrix does not initialize. > > However, it is expected that a data point will be read from the CEL file > for every location in this matrix. > > In the case of the GCOS/XDA and Calvin/CommandConsole format CEL file > the order in which things are in the file tells you where to place the > data in the matrix. This should be such that for a non-corrupt file > every entry in the matrix would get assigned an intensity value. > > In the case of text format CEL file there are actually x and y > coordinates given to each probe intensity. No specific ordering is > specified in the file documentation, so these x and y coordinates are > used to determine where the relevant intensity value should be placed in > the matrix. not much checking is being done on the validity of these > currently. Note of course that text format CEL files are starting to > become more rare. > > Now it turns out that in estrogen there are 9 CEL files. All 9 are text > format. > >> library(affy) >> setwd(system.file("extdata", package="estrogen")) >> list.celfiles() > [1] "bad.cel" "high10-1.cel" "high10-2.cel" "high48-1.cel" > "high48-2.cel" > [6] "low10-1.cel" "low10-2.cel" "low48-1.cel" "low48-2.cel" >> ab = ReadAffy() >> range(intensity(ab)) > [1] 0 46131 >> apply(intensity(ab),2,range) > bad.cel high10-1.cel high10-2.cel high48-1.cel high48-2.cel low10-1.cel > [1,] 0 32.5 49.5 420.5 425 0 > [2,] 46131 18308.0 45794.0 46111.0 46108 16457 > low10-2.cel low48-1.cel low48-2.cel > [1,] 0 360.5 395 > [2,] 20763 46102.0 46115 > >> which(intensity(ab)==0) > [1] 140224 2143523 2202236 2337976 2841090 >> 140224%%640 > [1] 64 >> 140224%/%640 > [1] 219 > > because things are zero based want to look for entry at 63 219 in > bad.cel > > Low and behold: > > 61 219 129.0 25.0 25 > 62 219 84.0 14.1 25 > 6219 133.3 32.4 20 > 64 219 4241.0 634.0 25 > 65 219 3215.0 565.7 25 > > so corrupt x/y location. What about in "low10-1.cel" > > >> which(intensity(ab)[,6] == 0)%%640 > 95523 154236 289976 > 163 636 56 >> which(intensity(ab)[,6] == 0)%/%640 > 95523 154236 289976 > 149 240 453 > > And again > > 161 149 58.8 7.4 20 > 162 049 100.0 11.4 25 > 163 149 156.5 31.9 20 > > and again > > 634 240 63.3 7.5 20 > 63240 57.3 8.8 20 > 636 240 70.0 10.3 25 > > and here too > > 54 453 132.0 21.5 25 > 55 053 94.0 11.5 25 > 56 453 145.8 19.5 16 > > In theory at least some these cases should have been caught by the > parser, since there is meant to be a check that ensures that no of those > coordinates are larger the the chip dimensions (which 6219 and 63240 > most certainly are). The parser should most certainly be throwing its > hands up in error at this point. > > In any case I would think it bad form to be distributing corrupt CEL > files as part of a package. > > Going into debugger land and looking at that first cel > > gdb) p buffer > $6 = " 63\b219\t133.3\t32.4\t 20\n\000 .......... > > and it turns out that it takes x location as 63, y location as 133 and > the intensity value as 32.4 (so in other words it is putting an > incorrect intensity in the wrong place). Now what the heck a "\b" (aka > ^H terminal code) is doing here I can't fathom, but you live with what > you get. > > Now I could enforce a check that there are five distinct entries (as > separated by tabs) on each row (which is something I avoided in the past > for speed since typically npixels and stddev are ignored) and that would > catch two of those cases. There is not much that can be done about the > catching the other two unless one were to enforce that x/y locations are > sequential. The file documentation used to implement the parser does not > specify that this has to be the case for text format files. > > > > > > > > On Tue, 2010-03-16 at 14:55 -0700, Seth Falcon wrote: >> On 3/16/10 2:37 PM, bmb at bmbolstad.com wrote: >>> ReadAffy uses allocMatrix() to allocate the memory into which it intends >>> to place probe intensities read from the file. >> >> Not sure there is an issue here or not based on your comments, but just >> so we are all on the same page, allocMatrix allocates but does not >> initialize memory for a matrix. >> >> + seth >> >> >> Based on information in the >>> CEL file header it determines grid dimensions to determine how many probe >>> intensities are expected. Exact parsing depends on the CEL file format of >>> which there are a few, but generally speaking it is expected that there is >>> an intensity to be read for every location in the grid. Zeros can arise in >>> truncated and corrupted CEL files. But many of those types of situations >>> should be detected and an error thrown. There may be other mechanisms for >>> generation of such probe intensities which make them valid. >>> >>> >>> >>>> On 03/16/2010 08:25 AM, Javier P?rez Florido wrote: >>>>> Dear list, >>>>> I've tried to use l.farms (farms package) >>>>> http://www.bioinf.jku.at/software/farms/farms.html >>>>> for preprocessing the estrogen data set available at >>>>> http://www.bioconductor.org/packages/release/data/experiment/htm l/estrogen.html >>>>> >>>>> >>>>> When trying to preprocess using l.farms, the following error comes up: >>>>> >>>>> background correction: none >>>>> normalization: loess >>>>> PM/MM correction : pmonly >>>>> expression values: farms >>>>> background correcting...done. >>>>> normalizing...Error: NA/NaN/Inf in call to external function (arg 1) >>>>> >>>>> q.farms works for the same data set and l.farms works for the Dilution >>>>> data set, so, I don't know what is going on here. >>>>> >>>>> Any suggestions? >>>> >>>> Hi Javier -- >>>> >>>> If I >>>> >>>> library(farms) >>>> library(estrogen) >>>> setwd(system.file("extdata", package="estrogen")) >>>> ab = ReadAffy() >>>> l.farms(ab) >>>> >>>> I see the error you report. It is because >>>> >>>>> sum(intensity(ab) == 0) >>>> [1] 5 >>>> >>>> so that log transformation generates -Inf and then the error. It doesn't >>>> seem too bad to >>>> >>>> intensity(ab) = 1 + intensity(ab) >>>> >>>> [I don't know enough about the inner workings of affyio, but I'm >>>> surprised that I sometimes see >>>> >>>>> range(intensity(ReadAffy())) >>>> [1] 0 46131 >>>> >>>>> range(intensity(ReadAffy)) >>>> [1] 1.563578e-316 4.613100e+04 >>>> >>>> suggesting either initialized memory or unanticipated 'empty' intensity >>>> grid coordinates)] >>>> >>>> Martin >>>> >>>> >>>>> Javier >>>>> >>>>> sessionInfo() >>>>> R version 2.10.1 (2009-12-14) >>>>> i386-pc-mingw32 >>>>> >>>>> locale: >>>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>>>> [5] LC_TIME=Spanish_Spain.1252 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] affyPLM_1.22.0 preprocessCore_1.8.0 gcrma_2.18.1 >>>>> [4] affydata_1.11.10 farms_1.4.0 MASS_7.3-4 >>>>> [7] hgu95av2cdf_2.5.0 affy_1.24.2 Biobase_2.6.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.14.0 Biostrings_2.14.12 IRanges_1.4.11 >>>>> splines_2.10.1 >>>>> [5] tools_2.10.1 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> -- >>>> Martin Morgan >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Seth Falcon >> Bioconductor Core Team | FHCRC >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6