GCRMA-induced correlations?

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080219/ c658b0d1/attachment.pl

• 1.3k views

ADD COMMENT • link updated 17.1 years ago by Zhijin Wu ▴ 260 • written 17.1 years ago by Jenny Drnevich ★ 2.2k

0

Entering edit mode

Zhijin Wu ▴ 260

@zhijin-wu-2378

Last seen 10.6 years ago

Yes, to eliminate this artifact The truncated values will no longer be adjusted in the next release of GCRMA. Jenny Drnevich wrote: > Hi Zhijin, > > A client pointed out a July 2007 article by Lim et al. testing different > normalization/pre-processing methods for their effects on pairwise > correlations between probesets (Bioinformatics 2007 23(13):i282-i288; > doi:10.1093/bioinformatics/btm201; full link below). They reported that > GCRMA introduced severe artificial correlations between probesets; they > looked for a cause and think it's due truncation of low-intensity values > after Non-Specific Binding adjustment and then the Gene-Specific Binding > adjustment on these truncated values. They also tested a specific > correction to the GCRMA algorithm that appears to prevent the artificial > correlation and suggest that it become an option or even a default in > the R implementation of GCRMA. > > What do you think of this article? Are there any plans to implement > their suggestion? > > Thanks, > Jenny > > Comparative analysis of microarray normalization procedures: effects on > reverse engineering gene networks > > http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i282 ?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorex acttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortspe c=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf > > > > <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="" i28="" 2?maxtoshow="&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andore" xacttitleabs="and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp" ec="relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf"> > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu > -- ------------------------------------------- Zhijin (Jean) Wu Assistant Professor of Biostatistics Brown University, Box G-S121 Providence, RI 02912 Tel: 401 863 1230 Fax: 401 863 9182 http://stat.brown.edu/~zwu

ADD COMMENT • link 17.1 years ago Zhijin Wu ▴ 260

0

Entering edit mode

Zhijin, Is there a mechanism to let us know when the version with this change has been released and that it contains the change? For example, let us say that I update GCRMA from the current version, automatically. Will it say in the Vignette or somewhere that the change has indeed been made? Please excuse my ignorance if there is some standard mechanism of which I am not aware. Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer Department of Biomedical Informatics (DBMI) Educational Coordinator Center for Computational Biology and Bioinformatics (C2B2) National Center for Multiscale Analysis of Genomic Networks (MAGNet) Box 95, Room 130BB or P&S 1-420C Columbia University Medical Center 630 W. 168th St. New York, NY 10032 (212)305-6901 (5-6901) (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ "Sure I am willing to stop watching television to get a better education." -Rose Friedman, age 11 On Feb 19, 2008, at 3:36 PM, Zhijin Wu wrote: > Yes, to eliminate this artifact The truncated values will no longer be > adjusted in the next release of GCRMA. > > Jenny Drnevich wrote: >> Hi Zhijin, >> >> A client pointed out a July 2007 article by Lim et al. testing >> different >> normalization/pre-processing methods for their effects on pairwise >> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; >> doi:10.1093/bioinformatics/btm201; full link below). They reported >> that >> GCRMA introduced severe artificial correlations between probesets; >> they >> looked for a cause and think it's due truncation of low-intensity >> values >> after Non-Specific Binding adjustment and then the Gene-Specific >> Binding >> adjustment on these truncated values. They also tested a specific >> correction to the GCRMA algorithm that appears to prevent the >> artificial >> correlation and suggest that it become an option or even a default in >> the R implementation of GCRMA. >> >> What do you think of this article? Are there any plans to implement >> their suggestion? >> >> Thanks, >> Jenny >> >> Comparative analysis of microarray normalization procedures: >> effects on >> reverse engineering gene networks >> >> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/ >> i282? >> maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorex >> acttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp >> ec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf >> >> >> >> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="">> i282? >> maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorex >> acttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp >> ec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf> >> >> Jenny Drnevich, Ph.D. >> >> Functional Genomics Bioinformatics Specialist >> W.M. Keck Center for Comparative and Functional Genomics >> Roy J. Carver Biotechnology Center >> University of Illinois, Urbana-Champaign >> >> 330 ERML >> 1201 W. Gregory Dr. >> Urbana, IL 61801 >> USA >> >> ph: 217-244-7355 >> fax: 217-265-5066 >> e-mail: drnevich at uiuc.edu >> > > > -- > ------------------------------------------- > Zhijin (Jean) Wu > Assistant Professor of Biostatistics > Brown University, Box G-S121 > Providence, RI 02912 > > Tel: 401 863 1230 > Fax: 401 863 9182 > http://stat.brown.edu/~zwu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD REPLY • link 17.1 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Package developers are encouraged to include a NEWS file, either at the top level, or in the inst directory (not both), that contains such details. The format should be similar to that of the NEWS file that comes with R. The Biostrings package also has a nice example of such a file. Richard Friedman wrote: > Zhijin, > > Is there a mechanism to let us know when the version with > this change has been released and that it contains the change? > For example, let us say that I update GCRMA from the > current version, automatically. Will it say in the Vignette or > somewhere that > the change has indeed been made? > > Please excuse my ignorance if there is some standard > mechanism of which I am not aware. > > Thanks and best wishes, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer > Department of Biomedical Informatics (DBMI) > Educational Coordinator > Center for Computational Biology and Bioinformatics (C2B2) > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Box 95, Room 130BB or P&S 1-420C > Columbia University Medical Center > 630 W. 168th St. > New York, NY 10032 > (212)305-6901 (5-6901) (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > "Sure I am willing to stop watching television > to get a better education." > -Rose Friedman, age 11 > > > > On Feb 19, 2008, at 3:36 PM, Zhijin Wu wrote: > >> Yes, to eliminate this artifact The truncated values will no longer be >> adjusted in the next release of GCRMA. >> >> Jenny Drnevich wrote: >>> Hi Zhijin, >>> >>> A client pointed out a July 2007 article by Lim et al. testing >>> different >>> normalization/pre-processing methods for their effects on pairwise >>> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; >>> doi:10.1093/bioinformatics/btm201; full link below). They reported >>> that >>> GCRMA introduced severe artificial correlations between probesets; >>> they >>> looked for a cause and think it's due truncation of low-intensity >>> values >>> after Non-Specific Binding adjustment and then the Gene-Specific >>> Binding >>> adjustment on these truncated values. They also tested a specific >>> correction to the GCRMA algorithm that appears to prevent the >>> artificial >>> correlation and suggest that it become an option or even a default in >>> the R implementation of GCRMA. >>> >>> What do you think of this article? Are there any plans to implement >>> their suggestion? >>> >>> Thanks, >>> Jenny >>> >>> Comparative analysis of microarray normalization procedures: >>> effects on >>> reverse engineering gene networks >>> >>> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/ >>> i282? >>> maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorex >>> acttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp >>> ec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf >>> >>> >>> >>> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="">>> i282? >>> maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andorex >>> acttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp >>> ec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf> >>> >>> Jenny Drnevich, Ph.D. >>> >>> Functional Genomics Bioinformatics Specialist >>> W.M. Keck Center for Comparative and Functional Genomics >>> Roy J. Carver Biotechnology Center >>> University of Illinois, Urbana-Champaign >>> >>> 330 ERML >>> 1201 W. Gregory Dr. >>> Urbana, IL 61801 >>> USA >>> >>> ph: 217-244-7355 >>> fax: 217-265-5066 >>> e-mail: drnevich at uiuc.edu >>> >> >> -- >> ------------------------------------------- >> Zhijin (Jean) Wu >> Assistant Professor of Biostatistics >> Brown University, Box G-S121 >> Providence, RI 02912 >> >> Tel: 401 863 1230 >> Fax: 401 863 9182 >> http://stat.brown.edu/~zwu >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 17.1 years ago rgentleman ★ 5.5k

0

Entering edit mode

Hi Zhijin, In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain... Quoting Lim's paper, "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues." Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA. Best wishes, Pierre. Zhijin Wu a ?crit : > Yes, to eliminate this artifact The truncated values will no longer be > adjusted in the next release of GCRMA. > > Jenny Drnevich wrote: >> Hi Zhijin, >> >> A client pointed out a July 2007 article by Lim et al. testing different >> normalization/pre-processing methods for their effects on pairwise >> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; >> doi:10.1093/bioinformatics/btm201; full link below). They reported that >> GCRMA introduced severe artificial correlations between probesets; they >> looked for a cause and think it's due truncation of low-intensity values >> after Non-Specific Binding adjustment and then the Gene-Specific Binding >> adjustment on these truncated values. They also tested a specific >> correction to the GCRMA algorithm that appears to prevent the artificial >> correlation and suggest that it become an option or even a default in >> the R implementation of GCRMA. >> >> What do you think of this article? Are there any plans to implement >> their suggestion? >> >> Thanks, >> Jenny >> >> Comparative analysis of microarray normalization procedures: effects on >> reverse engineering gene networks >> >> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i28 2?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andore xacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sortsp ec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf >> >> >> >> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="" i2="" 82?maxtoshow="&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&andor" exacttitleabs="and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sorts" pec="relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf"> >> >> Jenny Drnevich, Ph.D. >> >> Functional Genomics Bioinformatics Specialist >> W.M. Keck Center for Comparative and Functional Genomics >> Roy J. Carver Biotechnology Center >> University of Illinois, Urbana-Champaign >> >> 330 ERML >> 1201 W. Gregory Dr. >> Urbana, IL 61801 >> USA >> >> ph: 217-244-7355 >> fax: 217-265-5066 >> e-mail: drnevich at uiuc.edu >> > >

ADD REPLY • link 17.1 years ago Pierre Neuvial ▴ 80

0

Entering edit mode

Hi, another reason for adding "some noise" is to help the estimation algorithm to converge when the discreteness of the data dominates at lower intensities. Details: By default, Affymetrix takes the 75% quantile of the pixel intensities to be the probe signal, which mean if you've got 9 pixels (common with new chip types) that becomes *exactly* the 7:th pixel value. In other words, the pixel intensities observed in a CEL file are often "integers" (although they are stored as floats). At low intensities this this discreteness dominates, which you can see as a "peacock tail" if you do a log-ratio log-intensity plot. We observed convergence problems for the RMA norm+exp background model for some data sets (exon arrays; 9 pixels/probe, low intensities) because of the above. In order to help out, we have the option to add "jitter" before fitting the model (in the 'RmaBackgroundCorrection' of aroma.affymetrix), which seems to help. Cheers Henrik On Feb 19, 2008 11:56 PM, Pierre Neuvial <pierre.neuvial at="" curie.fr=""> wrote: > Hi Zhijin, > > In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain... > > Quoting Lim's paper, > > "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues." > > Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA. > > Best wishes, > > Pierre. > > > Zhijin Wu a ?crit : > > > Yes, to eliminate this artifact The truncated values will no longer be > > adjusted in the next release of GCRMA. > > > > Jenny Drnevich wrote: > >> Hi Zhijin, > >> > >> A client pointed out a July 2007 article by Lim et al. testing different > >> normalization/pre-processing methods for their effects on pairwise > >> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; > >> doi:10.1093/bioinformatics/btm201; full link below). They reported that > >> GCRMA introduced severe artificial correlations between probesets; they > >> looked for a cause and think it's due truncation of low-intensity values > >> after Non-Specific Binding adjustment and then the Gene-Specific Binding > >> adjustment on these truncated values. They also tested a specific > >> correction to the GCRMA algorithm that appears to prevent the artificial > >> correlation and suggest that it become an option or even a default in > >> the R implementation of GCRMA. > >> > >> What do you think of this article? Are there any plans to implement > >> their suggestion? > >> > >> Thanks, > >> Jenny > >> > >> Comparative analysis of microarray normalization procedures: effects on > >> reverse engineering gene networks > >> > >> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i 282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&ando rexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sort spec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf > >> > >> > >> > >> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="" i282?maxtoshow="&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&and" orexacttitleabs="and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sor" tspec="relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf"> > >> > >> Jenny Drnevich, Ph.D. > >> > >> Functional Genomics Bioinformatics Specialist > >> W.M. Keck Center for Comparative and Functional Genomics > >> Roy J. Carver Biotechnology Center > >> University of Illinois, Urbana-Champaign > >> > >> 330 ERML > >> 1201 W. Gregory Dr. > >> Urbana, IL 61801 > >> USA > >> > >> ph: 217-244-7355 > >> fax: 217-265-5066 > >> e-mail: drnevich at uiuc.edu > >> > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 17.1 years ago Henrik Bengtsson ★ 2.4k

0

Entering edit mode

Hi Henrik, a similar phenomenon (discreteness of low intensity values) occurs with image analysis software for other microarray platforms as well, see e.g. the data in the CCl4 package. Just adding 'random noise' to the data seems pretty unsatisfactory. Results may or may not be fine in practice, but for a "model-based" method I'd think the right thing to do is to either fix your model, or the estimation algorithm, or to diagnose lack of model fit; but perhaps not surreptitiously tweaking the data to make it look like you think it ought to. I realize that this can be a difficult goal in practice, but I like functions/packages better that avoid ad hoc heuristics only apparent when reading the code, and do what the label (e.g. the accompanying methods paper) says. Best wishes Wolfgang Henrik Bengtsson wrote: > Hi, > > another reason for adding "some noise" is to help the estimation > algorithm to converge when the discreteness of the data dominates at > lower intensities. > > Details: By default, Affymetrix takes the 75% quantile of the pixel > intensities to be the probe signal, which mean if you've got 9 pixels > (common with new chip types) that becomes *exactly* the 7:th pixel > value. In other words, the pixel intensities observed in a CEL file > are often "integers" (although they are stored as floats). At low > intensities this this discreteness dominates, which you can see as a > "peacock tail" if you do a log-ratio log-intensity plot. > > We observed convergence problems for the RMA norm+exp background model > for some data sets (exon arrays; 9 pixels/probe, low intensities) > because of the above. In order to help out, we have the option to add > "jitter" before fitting the model (in the 'RmaBackgroundCorrection' of > aroma.affymetrix), which seems to help. > > Cheers > > Henrik > > > On Feb 19, 2008 11:56 PM, Pierre Neuvial <pierre.neuvial at="" curie.fr=""> wrote: >> Hi Zhijin, >> >> In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain... >> >> Quoting Lim's paper, >> >> "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues." >> >> Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA. >> >> Best wishes, >> >> Pierre. >> >> >> Zhijin Wu a ?crit : >> >>> Yes, to eliminate this artifact The truncated values will no longer be >>> adjusted in the next release of GCRMA. >>> >>> Jenny Drnevich wrote: >>>> Hi Zhijin, >>>> >>>> A client pointed out a July 2007 article by Lim et al. testing different >>>> normalization/pre-processing methods for their effects on pairwise >>>> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; >>>> doi:10.1093/bioinformatics/btm201; full link below). They reported that >>>> GCRMA introduced severe artificial correlations between probesets; they >>>> looked for a cause and think it's due truncation of low-intensity values >>>> after Non-Specific Binding adjustment and then the Gene-Specific Binding >>>> adjustment on these truncated values. They also tested a specific >>>> correction to the GCRMA algorithm that appears to prevent the artificial >>>> correlation and suggest that it become an option or even a default in >>>> the R implementation of GCRMA. >>>> >>>> What do you think of this article? Are there any plans to implement >>>> their suggestion? >>>> >>>> Thanks, >>>> Jenny >>>> >>>> Comparative analysis of microarray normalization procedures: effects on >>>> reverse engineering gene networks >>>> >>>> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i 282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&ando rexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sort spec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf >>>> >>>> >>>> >>>> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="" i282?maxtoshow="&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&and" orexacttitleabs="and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&sor" tspec="relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf"> >>>> >>>> Jenny Drnevich, Ph.D. >>>> >>>> Functional Genomics Bioinformatics Specialist >>>> W.M. Keck Center for Comparative and Functional Genomics >>>> Roy J. Carver Biotechnology Center >>>> University of Illinois, Urbana-Champaign >>>> >>>> 330 ERML >>>> 1201 W. Gregory Dr. >>>> Urbana, IL 61801 >>>> USA >>>> >>>> ph: 217-244-7355 >>>> fax: 217-265-5066 >>>> e-mail: drnevich at uiuc.edu >>>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

ADD REPLY • link 17.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

On Wed, Feb 20, 2008 at 5:20 AM, Wolfgang Huber <huber at="" ebi.ac.uk=""> wrote: > > Hi Henrik, > > a similar phenomenon (discreteness of low intensity values) occurs with > image analysis software for other microarray platforms as well, see e.g. > the data in the CCl4 package. Just adding 'random noise' to the data > seems pretty unsatisfactory. Results may or may not be fine in practice, > but for a "model-based" method I'd think the right thing to do is to > either fix your model, or the estimation algorithm, or to diagnose lack > of model fit; but perhaps not surreptitiously tweaking the data to make > it look like you think it ought to. I second this. > > I realize that this can be a difficult goal in practice, but I like > functions/packages better that avoid ad hoc heuristics only apparent > when reading the code, and do what the label (e.g. the accompanying > methods paper) says. I second this one too. In our case with RMA norm+exp correction, the fix was done to "get this done" and the one who added it was happy enough with the results (without it the model fit fails). As a protection the user has to use 'addJitter=TRUE' to turn it on. I can see how the real fix falls between the chairs: the original RMA norm+exp was written for a chip type that did not produce low-intensity discrete signals and everything worked fine and now someone gets around and wants to use it for a new chip type with slightly new properties. There is also a catch 22 for aroma.affymetrix here: people ask for perfect reproducibility of the existing RMA bg correction (and it does), cf. this thread, but that then also perfectly reproduce the discreteness problem. To not provide the bg correction method is not an option. The 'addJitter' provides an ad-hoc patch for the problem until someone(?) has time to come up with a better solution. It all comes down to time, ehe. ...and that is how yet another strange option was born. Cheers /Henrik > > Best wishes > Wolfgang > > > > > Henrik Bengtsson wrote: > > Hi, > > > > another reason for adding "some noise" is to help the estimation > > algorithm to converge when the discreteness of the data dominates at > > lower intensities. > > > > Details: By default, Affymetrix takes the 75% quantile of the pixel > > intensities to be the probe signal, which mean if you've got 9 pixels > > (common with new chip types) that becomes *exactly* the 7:th pixel > > value. In other words, the pixel intensities observed in a CEL file > > are often "integers" (although they are stored as floats). At low > > intensities this this discreteness dominates, which you can see as a > > "peacock tail" if you do a log-ratio log-intensity plot. > > > > We observed convergence problems for the RMA norm+exp background model > > for some data sets (exon arrays; 9 pixels/probe, low intensities) > > because of the above. In order to help out, we have the option to add > > "jitter" before fitting the model (in the 'RmaBackgroundCorrection' of > > aroma.affymetrix), which seems to help. > > > > Cheers > > > > Henrik > > > > > > On Feb 19, 2008 11:56 PM, Pierre Neuvial <pierre.neuvial at="" curie.fr=""> wrote: > >> Hi Zhijin, > >> > >> In Lim's paper they also suggest to add some noise to truncated probes: I believe (and this is my experience as well) that otherwise they would have exactly the same signal values for truncated probes, and correlations between low intensity probes would remain... > >> > >> Quoting Lim's paper, > >> > >> "To test our speculations, we reimplemented the GCRMA procedure without adjusting GSB for uninformative probes-i.e. probes that are truncated to m after NSB adjustment. To ensure the lowest intensity rank of these probes, any other probes with GSB-adjusted value less than m were also truncated at m. Finally, an infinitesimal amount of uniformly distributed noise was added to truncated probes to avoid rank-order correlation issues." > >> > >> Do you plan to add this "noise" as well ? If so, how should the noise level be chosen ? And how about reproducibility of the results of GCRMA ? I think this particular issue is related to the recent thread about set.seed() in GCRMA. > >> > >> Best wishes, > >> > >> Pierre. > >> > >> > >> Zhijin Wu a ?crit : > >> > >>> Yes, to eliminate this artifact The truncated values will no longer be > >>> adjusted in the next release of GCRMA. > >>> > >>> Jenny Drnevich wrote: > >>>> Hi Zhijin, > >>>> > >>>> A client pointed out a July 2007 article by Lim et al. testing different > >>>> normalization/pre-processing methods for their effects on pairwise > >>>> correlations between probesets (Bioinformatics 2007 23(13):i282-i288; > >>>> doi:10.1093/bioinformatics/btm201; full link below). They reported that > >>>> GCRMA introduced severe artificial correlations between probesets; they > >>>> looked for a cause and think it's due truncation of low- intensity values > >>>> after Non-Specific Binding adjustment and then the Gene- Specific Binding > >>>> adjustment on these truncated values. They also tested a specific > >>>> correction to the GCRMA algorithm that appears to prevent the artificial > >>>> correlation and suggest that it become an option or even a default in > >>>> the R implementation of GCRMA. > >>>> > >>>> What do you think of this article? Are there any plans to implement > >>>> their suggestion? > >>>> > >>>> Thanks, > >>>> Jenny > >>>> > >>>> Comparative analysis of microarray normalization procedures: effects on > >>>> reverse engineering gene networks > >>>> > >>>> http://bioinformatics.oxfordjournals.org/cgi/content/full/23/1 3/i282?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&a ndorexacttitleabs=and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&s ortspec=relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf > >>>> > >>>> > >>>> > >>>> <http: bioinformatics.oxfordjournals.org="" cgi="" content="" full="" 23="" 13="" i282?maxtoshow="&HITS=10&hits=10&RESULTFORMAT=1&andorexacttitle=and&" andorexacttitleabs="and&andorexactfulltext=and&searchid=1&FIRSTINDEX=0&" sortspec="relevance&volume=23&firstpage=i282&resourcetype=HWCIT&eaf"> > >>>> > >>>> Jenny Drnevich, Ph.D. > >>>> > >>>> Functional Genomics Bioinformatics Specialist > >>>> W.M. Keck Center for Comparative and Functional Genomics > >>>> Roy J. Carver Biotechnology Center > >>>> University of Illinois, Urbana-Champaign > >>>> > >>>> 330 ERML > >>>> 1201 W. Gregory Dr. > >>>> Urbana, IL 61801 > >>>> USA > >>>> > >>>> ph: 217-244-7355 > >>>> fax: 217-265-5066 > >>>> e-mail: drnevich at uiuc.edu > >>>> > >>> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Best wishes > Wolfgang > > ------------------------------------------------------------------ > Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > >

ADD REPLY • link 17.1 years ago Henrik Bengtsson ★ 2.4k

Login before adding your answer.