RMA and justRMA error

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.3 years ago

United States

Dear BioC I know that this error is reported a few times on the Bioc mailing list, however no resolution to it is available in the archives (or at least none that google and I could find). I get the same error whether I use R 2.3.1 or the devel version. I enclose the devel version error. The cels files are read in by ReadAffy and are processed ok by gcrma, however fall over when I try to run rma or justRMA. Thanks for your help Aedin > df = justRMA(filenames=filenam[125:130]) Background correcting Error in density.default(x, kernel = "epanechnikov", n = 2^14) : need at least 2 points to select a bandwidth automatically > df = ReadAffy(filenames=filenam[125:130]) > df AffyBatch object size of arrays=1164x1164 features (63518 kb) cdf=HG-U133_Plus_2 (54675 affyids) number of samples=6 number of genes=54675 annotation=hgu133plus2 > df.rma= rma(df) Background correcting Error in density.default(x, kernel = "epanechnikov", n = 2^14) : need at least 2 points to select a bandwidth automatically > library(gcrma) > df.gcrma= gcrma(df) Adjusting for optical effect......Done. Computing affinities.Done. Adjusting for non-specific binding......Done. Normalizing Calculating Expression > sessionInfo() R version 2.4.0 Under development (unstable) (2006-08-06 r38809) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US. UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8 ;LC_IDENTIFICATION=C attached base packages: [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" [7] "utils" "datasets" "base" other attached packages: hgu133plus2probe hgu133plus2cdf gcrma matchprobes "1.12.0" "1.12.0" "2.5.1" "1.5.0" affy affyio Biobase made4 "1.11.6" "1.1.5" "1.11.24" "1.7.1" scatterplot3d ade4 "0.3-24" "1.4-1" -- Aed?n Culhane Research Associate in Prof. J Quackenbush Lab Harvard School of Public Health, Dana-Farber Cancer Institute 44 Binney Street, Mayer 232 Department of Biostatistics Dana-Farber Cancer Institute Boston, MA 02115 USA

Cancer cdf Biobase affy gcrma affyio Cancer cdf Biobase affy gcrma affyio • 2.1k views

ADD COMMENT • link 18.4 years ago Aedin Culhane ▴ 510

0

Entering edit mode

Ben Bolstad ★ 1.2k

@ben-bolstad-1494

Last seen 7.3 years ago

Typically, when I have encountered others who have had this error occur it is because they have corrupted data. For instance this piece of demonstration code will generate the same error: library(affy);library(affydata) data(Dilution) Dilution.Corrupted <- Dilution pm(Dilution.Corrupted)[1,1] <- 30000000 # that is an extreme value outside the # range of normal raw probe intensities eset <- rma(Dilution.Corrupted) My suggestion would be to examine things along those lines. Best, Ben On Tue, 2006-08-15 at 15:01 -0400, aedin wrote: > Dear BioC > I know that this error is reported a few times on the Bioc mailing list, > however no resolution to it is available in the archives (or at least > none that google and I could find). I get the same error whether I use > R 2.3.1 or the devel version. I enclose the devel version error. > > The cels files are read in by ReadAffy and are processed ok by gcrma, > however fall over when I try to run rma or justRMA. > > Thanks for your help > Aedin > > > df = justRMA(filenames=filenam[125:130]) > Background correcting > Error in density.default(x, kernel = "epanechnikov", n = 2^14) : > need at least 2 points to select a bandwidth automatically > > > df = ReadAffy(filenames=filenam[125:130]) > > df > AffyBatch object > size of arrays=1164x1164 features (63518 kb) > cdf=HG-U133_Plus_2 (54675 affyids) > number of samples=6 > number of genes=54675 > annotation=hgu133plus2 > > > df.rma= rma(df) > Background correcting > Error in density.default(x, kernel = "epanechnikov", n = 2^14) : > need at least 2 points to select a bandwidth automatically > > > library(gcrma) > > df.gcrma= gcrma(df) > Adjusting for optical effect......Done. > Computing affinities.Done. > Adjusting for non-specific binding......Done. > Normalizing > Calculating Expression > > > sessionInfo() > R version 2.4.0 Under development (unstable) (2006-08-06 r38809) > i686-pc-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_U S.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF -8;LC_IDENTIFICATION=C > > attached base packages: > [1] "splines" "tools" "methods" "stats" "graphics" "grDevices" > [7] "utils" "datasets" "base" > > other attached packages: > hgu133plus2probe hgu133plus2cdf gcrma matchprobes > "1.12.0" "1.12.0" "2.5.1" "1.5.0" > affy affyio Biobase made4 > "1.11.6" "1.1.5" "1.11.24" "1.7.1" > scatterplot3d ade4 > "0.3-24" "1.4-1" -- Ben Bolstad <bmb at="" bmbolstad.com=""> http://bmbolstad.com

ADD COMMENT • link 18.4 years ago Ben Bolstad ★ 1.2k

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.3 years ago

United States

Thanks Ben Sorry I thought the same parser would apply to each method. I found the culprit file using the approach you list below. It was not obvious in any of the normal plots (hist, boxplot etc) as only one probeset had a ridiculous value (it was 5.6 x10^14). This would completely skew a mean but not a median. Should I be wary of this cel file and dump it, or if it looks ok in the hist, boxplot should I try to keep it? Do you know what would cause this? How frequently does this occur? Thanks for your help Aedin Ben Bolstad wrote: >The parsing code does not necessarily detect all potential corruptions. >And you will find that gcrma() will quite happily process the "corrupt" >data I show below. > >The error itself is from the density() function. If you could isolate >the array that is causing trouble using say something like this: > >for (i in 1:4){ >cat(i,"\n") >blah <- bg.correct.rma(Dilution.Corrupted[,i]) >} > >The perhaps we could look at it a little closer. > >best, > >Ben > > > >On Tue, 2006-08-15 at 18:13 -0400, aedin wrote: > > >>Dear Ben >>Thanks for your reply. However if the data were corrupted, surely they >>would not be read by ReadAffy and gcrma? >>Aedin >> >>Ben Bolstad wrote: >> >> >>>Typically, when I have encountered others who have had this error occur >>>it is because they have corrupted data. For instance this piece of >>>demonstration code will generate the same error: >>> >>> >>>library(affy);library(affydata) >>>data(Dilution) >>>Dilution.Corrupted <- Dilution >>>pm(Dilution.Corrupted)[1,1] <- 30000000 >>># that is an extreme value outside the >>># range of normal raw probe intensities >>> >>>eset <- rma(Dilution.Corrupted) >>> >>> >>>My suggestion would be to examine things along those lines. >>> >>>Best, >>> >>>Ben >>> >>> >>> >>> >>> >>> >>> >>> >>>On Tue, 2006-08-15 at 15:01 -0400, aedin wrote: >>> >>> >>> >>>>Dear BioC >>>>I know that this error is reported a few times on the Bioc mailing list, >>>>however no resolution to it is available in the archives (or at least >>>>none that google and I could find). I get the same error whether I use >>>>R 2.3.1 or the devel version. I enclose the devel version error. >>>> >>>>The cels files are read in by ReadAffy and are processed ok by gcrma, >>>>however fall over when I try to run rma or justRMA. >>>> >>>>Thanks for your help >>>>Aedin >>>> >>>> > df = justRMA(filenames=filenam[125:130]) >>>>Background correcting >>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) : >>>> need at least 2 points to select a bandwidth automatically >>>> >>>> > df = ReadAffy(filenames=filenam[125:130]) >>>> > df >>>>AffyBatch object >>>>size of arrays=1164x1164 features (63518 kb) >>>>cdf=HG-U133_Plus_2 (54675 affyids) >>>>number of samples=6 >>>>number of genes=54675 >>>>annotation=hgu133plus2 >>>> >>>> > df.rma= rma(df) >>>>Background correcting >>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) : >>>> need at least 2 points to select a bandwidth automatically >>>> >>>> > library(gcrma) >>>> > df.gcrma= gcrma(df) >>>>Adjusting for optical effect......Done. >>>>Computing affinities.Done. >>>>Adjusting for non-specific binding......Done. >>>>Normalizing >>>>Calculating Expression >>>> >>>> > sessionInfo() >>>>R version 2.4.0 Under development (unstable) (2006-08-06 r38809) >>>>i686-pc-linux-gnu >>>> >>>>locale: >>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=e n_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en _US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.U TF-8;LC_IDENTIFICATION=C >>>> >>>>attached base packages: >>>>[1] "splines" "tools" "methods" "stats" "graphics" "grDevices" >>>>[7] "utils" "datasets" "base" >>>> >>>>other attached packages: >>>>hgu133plus2probe hgu133plus2cdf gcrma matchprobes >>>> "1.12.0" "1.12.0" "2.5.1" "1.5.0" >>>> affy affyio Biobase made4 >>>> "1.11.6" "1.1.5" "1.11.24" "1.7.1" >>>> scatterplot3d ade4 >>>> "0.3-24" "1.4-1" >>>> >>>> >>>> >>> >>> >>> >>:-) >> >>-- >>Aed?n Culhane >>Research Associate in Prof. J Quackenbush Lab >>Harvard School of Public Health, Dana-Farber Cancer Institute >> >> >>44 Binney Street, Mayer 232 >>Department of Biostatistics >>Dana-Farber Cancer Institute >>Boston, MA 02115 >>USA >> >>Phone: +1 (617) 632 2468 >>Fax: +1 (617) 632 5444 >>Email: aedin at jimmy.harvard.edu >>Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html >> >> >> >> -- Aed?n Culhane Research Associate in Prof. J Quackenbush Lab Harvard School of Public Health, Dana-Farber Cancer Institute 44 Binney Street, Mayer 232 Department of Biostatistics Dana-Farber Cancer Institute Boston, MA 02115 USA Phone: +1 (617) 632 2468 Fax: +1 (617) 632 5444 Email: aedin at jimmy.harvard.edu Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html

ADD COMMENT • link 18.4 years ago Aedin Culhane ▴ 510

0

Entering edit mode

Aedin Culhane ▴ 510

@aedin-culhane-1526

Last seen 5.3 years ago

United States

Hi Ben I traced my problem down a bit more. I ftp the cel files as a .ZIP archive. If I uncompress them using winzip on windows, the files are ok. However I was using unzip on Linux and this seems to do some weird and wonderful things. Although the 1st quartile, median and 3rd quartile appear to be consistent (from the files I have checked), the min value and the max value seem to be different. So unzip is extracting the files without error (gzip or gunzip don't appear to be winzip .ZIP archive friendly), but it is clearly doing some character re-shuffling. Sorry this is not a BioC problem. But do you know if this a known problem or if there is a parameter that I should specify?? Thanks so much for all of your help Regards Aedin ***unzip details. I am using FC4*** UnZip 5.51 of 22 May 2004, by Info-ZIP. Maintained by C. Spieler. Compiled with gcc 4.0.2 20051125 (Red Hat 4.0.2-8) for Unix (Linux ELF) on Feb 6 2006. Ben Bolstad wrote: >If you can send me the original CEL file I can take a look to see if it >is something I consider that should be detectable parsing error. > >Ben > > >On Tue, 2006-08-15 at 19:48 -0400, aedin wrote: > > >>Thanks Ben >>Sorry I thought the same parser would apply to each method. I found the >>culprit file using the approach you list below. >> >>It was not obvious in any of the normal plots (hist, boxplot etc) as >>only one probeset had a ridiculous value (it was 5.6 x10^14). This >>would completely skew a mean but not a median. >> >>Should I be wary of this cel file and dump it, or if it looks ok in the >>hist, boxplot should I try to keep it? Do you know what would cause >>this? How frequently does this occur? >> >>Thanks for your help >>Aedin >> >> >>Ben Bolstad wrote: >> >> >> >>>The parsing code does not necessarily detect all potential corruptions. >>>And you will find that gcrma() will quite happily process the "corrupt" >>>data I show below. >>> >>>The error itself is from the density() function. If you could isolate >>>the array that is causing trouble using say something like this: >>> >>>for (i in 1:4){ >>>cat(i,"\n") >>>blah <- bg.correct.rma(Dilution.Corrupted[,i]) >>>} >>> >>>The perhaps we could look at it a little closer. >>> >>>best, >>> >>>Ben >>> >>> >>> >>>On Tue, 2006-08-15 at 18:13 -0400, aedin wrote: >>> >>> >>> >>> >>>>Dear Ben >>>>Thanks for your reply. However if the data were corrupted, surely they >>>>would not be read by ReadAffy and gcrma? >>>>Aedin >>>> >>>>Ben Bolstad wrote: >>>> >>>> >>>> >>>> >>>>>Typically, when I have encountered others who have had this error occur >>>>>it is because they have corrupted data. For instance this piece of >>>>>demonstration code will generate the same error: >>>>> >>>>> >>>>>library(affy);library(affydata) >>>>>data(Dilution) >>>>>Dilution.Corrupted <- Dilution >>>>>pm(Dilution.Corrupted)[1,1] <- 30000000 >>>>># that is an extreme value outside the >>>>># range of normal raw probe intensities >>>>> >>>>>eset <- rma(Dilution.Corrupted) >>>>> >>>>> >>>>>My suggestion would be to examine things along those lines. >>>>> >>>>>Best, >>>>> >>>>>Ben >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>On Tue, 2006-08-15 at 15:01 -0400, aedin wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Dear BioC >>>>>>I know that this error is reported a few times on the Bioc mailing list, >>>>>>however no resolution to it is available in the archives (or at least >>>>>>none that google and I could find). I get the same error whether I use >>>>>>R 2.3.1 or the devel version. I enclose the devel version error. >>>>>> >>>>>>The cels files are read in by ReadAffy and are processed ok by gcrma, >>>>>>however fall over when I try to run rma or justRMA. >>>>>> >>>>>>Thanks for your help >>>>>>Aedin >>>>>> >>>>>> >>>>>> >>>>>>>df = justRMA(filenames=filenam[125:130]) >>>>>>> >>>>>>> >>>>>>Background correcting >>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) : >>>>>> need at least 2 points to select a bandwidth automatically >>>>>> >>>>>> >>>>>> >>>>>>>df = ReadAffy(filenames=filenam[125:130]) >>>>>>>df >>>>>>> >>>>>>> >>>>>>AffyBatch object >>>>>>size of arrays=1164x1164 features (63518 kb) >>>>>>cdf=HG-U133_Plus_2 (54675 affyids) >>>>>>number of samples=6 >>>>>>number of genes=54675 >>>>>>annotation=hgu133plus2 >>>>>> >>>>>> >>>>>> >>>>>>>df.rma= rma(df) >>>>>>> >>>>>>> >>>>>>Background correcting >>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) : >>>>>> need at least 2 points to select a bandwidth automatically >>>>>> >>>>>> >>>>>> >>>>>>>library(gcrma) >>>>>>>df.gcrma= gcrma(df) >>>>>>> >>>>>>> >>>>>>Adjusting for optical effect......Done. >>>>>>Computing affinities.Done. >>>>>>Adjusting for non-specific binding......Done. >>>>>>Normalizing >>>>>>Calculating Expression >>>>>> >>>>>> >>>>>> >>>>>>>sessionInfo() >>>>>>> >>>>>>> >>>>>>R version 2.4.0 Under development (unstable) (2006-08-06 r38809) >>>>>>i686-pc-linux-gnu >>>>>> >>>>>>locale: >>>>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE =en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER= en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US .UTF-8;LC_IDENTIFICATION=C >>>>>> >>>>>>attached base packages: >>>>>>[1] "splines" "tools" "methods" "stats" "graphics" "grDevices" >>>>>>[7] "utils" "datasets" "base" >>>>>> >>>>>>other attached packages: >>>>>>hgu133plus2probe hgu133plus2cdf gcrma matchprobes >>>>>> "1.12.0" "1.12.0" "2.5.1" "1.5.0" >>>>>> affy affyio Biobase made4 >>>>>> "1.11.6" "1.1.5" "1.11.24" "1.7.1" >>>>>> scatterplot3d ade4 >>>>>> "0.3-24" "1.4-1" >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>:-) >>>> >>>>-- >>>>Aed?n Culhane >>>>Research Associate in Prof. J Quackenbush Lab >>>>Harvard School of Public Health, Dana-Farber Cancer Institute >>>> >>>> >>>>44 Binney Street, Mayer 232 >>>>Department of Biostatistics >>>>Dana-Farber Cancer Institute >>>>Boston, MA 02115 >>>>USA >>>> >>>>Phone: +1 (617) 632 2468 >>>>Fax: +1 (617) 632 5444 >>>>Email: aedin at jimmy.harvard.edu >>>>Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html >>>> >>>> >>>> >>>> >>>> >>>> >> >> -- Aed?n Culhane Research Associate in Prof. J Quackenbush Lab Harvard School of Public Health, Dana-Farber Cancer Institute 44 Binney Street, Mayer 232 Department of Biostatistics Dana-Farber Cancer Institute Boston, MA 02115 USA Phone: +1 (617) 632 2468 Fax: +1 (617) 632 5444 Email: aedin at jimmy.harvard.edu Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html

ADD COMMENT • link 18.4 years ago Aedin Culhane ▴ 510

0

Entering edit mode

Dear list! I am working on some affymetrix data on chicken and am using R for analysis of the data. I would like to add annotation data and as far as I can see there is no annotation package for chicken at bioconductor. When reading the vignette for AnnBuilder I got the impression that packages can only be built for organisms human, rat and mouse so far. It also said: See section Extend AnnBuilder if you have an organism other than the three species. But the Extend AnnBuilder section is empty. My question is if I can build an annotation package for chicken using AnnBuilder and how I would do that? Thank you! Best regards, Lina Hultin Rosenberg ________________________________ Lina Hultin Rosenberg Msc Molecular Biotechnology Evolutionary Biology Department Uppsala University Norbyv?gen 18 752 36 Uppsala Phone: +46-18-4716444 Email: lina.hultin.rosenberg at ebc.uu.se

ADD REPLY • link 18.4 years ago Lina Hultin-Rosenberg ▴ 180

0

Entering edit mode

Hi Lina, Lina Hultin-Rosenberg wrote: > Dear list! > > I am working on some affymetrix data on chicken and am using R for analysis > of the data. I would like to add annotation data and as far as I can see > there is no annotation package for chicken at bioconductor. When reading the > vignette for AnnBuilder I got the impression that packages can only be built > for organisms human, rat and mouse so far. It also said: See section Extend > AnnBuilder if you have an organism other than the three species. But the > Extend AnnBuilder section is empty. My question is if I can build an > annotation package for chicken using AnnBuilder and how I would do that? I am sure Nianhua will respond with some ideas of how to build an annotation package, but if this isn't possible (or is very difficult), you might consider using the biomaRt package to annotate your probesets. You can get quite a bit of annotation this way, and you can even use your affy probeset IDs to annotate, rather than having to convert to Entrez Gene or UniGene IDs first. If your goal is to output a list of the top probesets along with annotation, you might look at the affycoretools package, which has some wrapper functions intended to simplify the output of HTML tables using a set of probe IDs (or output from limma). There will also be a vignette that explains some of these functions in more detail as soon as I get it to quit erroring out on the BioC build servers ;-D. Note that these functions are in the devel version of affycoretools, so you if you want to use them, you will need the devel version of R as well. HTH, Jim > > Thank you! > > Best regards, > > Lina Hultin Rosenberg > > ________________________________ > Lina Hultin Rosenberg > Msc Molecular Biotechnology > Evolutionary Biology Department > Uppsala University > Norbyv?gen 18 > 752 36 Uppsala > Phone: +46-18-4716444 > Email: lina.hultin.rosenberg at ebc.uu.se > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 18.4 years ago James W. MacDonald 67k

Login before adding your answer.