checking multi-modalities in histograms
2
0
Entering edit mode
@javier-perez-florido-3121
Last seen 6.6 years ago
Dear list, Histograms are usually used to check the quality of microarray experiments. If there are bi-modalities in a particular array, it is a candidate to exclude it from the experiment. It is easy to check bi-modalities or multi-modalities visually, but I would like to know if there is a way (using a statistical test or something) to check multi-modalities using the data returned by the hist function. For an Affybatch object, hist function returns the X and Y values, but that's all, it doesn't return the variables breaks, counts, etc as it is said in the help manual for hist. So, I have two questions: * Is there a test to check for multi-modalities in histograms? * Is there a way to know the cells and the number of values per cell used by hist to check for multi-modalities in a rudimentary way? Thanks again, Javier [[alternative HTML version deleted]]
• 1.2k views
ADD COMMENT
0
Entering edit mode
Kevin Coombes ▴ 430
@kevin-coombes-3935
Last seen 2.0 years ago
United States
The Mclust R package has one set of tools to do this. I would also advise you to take a look at the bimodality index that we defined in Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR. The bimodality index: a criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data. Cancer Inform. 2009 Aug 5;7:199-216. PMID: 19718451 along with the editorial Ertel A. Bimodal gene expression and biomarker discovery. Cancer Inform. 2010 Feb 4;9:11-4. PMID: 20234772 An R package (ClassDiscovery) that includes a function to compute the bimodality index can be obtained from http://bioinformatics.mdanderson.org/Software/OOMPA/ Best, Kevin Javier P?rez Florido wrote: > Dear list, > Histograms are usually used to check the quality of microarray > experiments. If there are bi-modalities in a particular array, it is a > candidate to exclude it from the experiment. It is easy to check > bi-modalities or multi-modalities visually, but I would like to know if > there is a way (using a statistical test or something) to check > multi-modalities using the data returned by the hist function. > > For an Affybatch object, hist function returns the X and Y values, but > that's all, it doesn't return the variables breaks, counts, etc as it is > said in the help manual for hist. So, I have two questions: > > * Is there a test to check for multi-modalities in histograms? > * Is there a way to know the cells and the number of values per cell > used by hist to check for multi-modalities in a rudimentary way? > > Thanks again, > Javier > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Thanks Kevin, I had a look at the Mclust package and I don't see any function that could help me on this, could you please be more precise? On the other hand, I also had a look at the bimodalIndex function in ClassDiscovery package and I don't know which parameter of the BI variable is interesting for me. Moreover, I would like to check for bi-modalities or multi-bimodalities on affybatch objects (the hist function uses affybatch objects), which, as far as I know, cannnot be handled by the bimodalIndex function. Any suggestions? Thanks, Javier On 29/03/2010 19:41, Kevin Coombes wrote: > The Mclust R package has one set of tools to do this. > > I would also advise you to take a look at the bimodality index that we > defined in > Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR. > The bimodality index: a criterion for discovering and ranking > bimodal signatures from cancer gene expression profiling data. > Cancer Inform. 2009 Aug 5;7:199-216. PMID: 19718451 > along with the editorial > Ertel A. > Bimodal gene expression and biomarker discovery. > Cancer Inform. 2010 Feb 4;9:11-4. PMID: 20234772 > > An R package (ClassDiscovery) that includes a function to compute the > bimodality index can be obtained from > http://bioinformatics.mdanderson.org/Software/OOMPA/ > > Best, > Kevin > > Javier P?rez Florido wrote: >> Dear list, >> Histograms are usually used to check the quality of microarray >> experiments. If there are bi-modalities in a particular array, it is >> a candidate to exclude it from the experiment. It is easy to check >> bi-modalities or multi-modalities visually, but I would like to know >> if there is a way (using a statistical test or something) to check >> multi-modalities using the data returned by the hist function. >> >> For an Affybatch object, hist function returns the X and Y values, >> but that's all, it doesn't return the variables breaks, counts, etc >> as it is said in the help manual for hist. So, I have two questions: >> >> * Is there a test to check for multi-modalities in histograms? >> * Is there a way to know the cells and the number of values per cell >> used by hist to check for multi-modalities in a rudimentary way? >> >> Thanks again, >> Javier >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 10 weeks ago
EMBL European Molecular Biology Laborat…
Dear Javier note that the number of modes of a distribution - can depend on the normalisation (before or after log-transformation; or whether background correction was done and how) - is impossible to determine from a finite sample without further assumptions (essentially a smoothing bandwidth) Besides these (significant) practical difficulties, I am also doubtfulof the usefulness, in terms of sensitivity and specificity, of this criterion for array quality diagnostics. If you see two modes, they would most likely be associated with a covariate, such as row, column, spatial position on the array. Then, if you find that this co- variate is quality-relevant, then I would advise checking for significant effects of that covariate even on arrays where the distribution looks uni-modal. Best wishes Wolfgang Mar 29, 2010, alle ore 6:14 PM, Javier P?rez Florido > Dear list, > Histograms are usually used to check the quality of microarray > experiments. If there are bi-modalities in a particular array, it is a > candidate to exclude it from the experiment. It is easy to check > bi-modalities or multi-modalities visually, but I would like to know if > there is a way (using a statistical test or something) to check > multi-modalities using the data returned by the hist function. > > For an Affybatch object, hist function returns the X and Y values, but > that's all, it doesn't return the variables breaks, counts, etc as it is > said in the help manual for hist. So, I have two questions: > > * Is there a test to check for multi-modalities in histograms? > * Is there a way to know the cells and the number of values per cell > used by hist to check for multi-modalities in a rudimentary way? > > Thanks again, > Javier > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Dear Wolfgang, Thanks for your reply. The data I am going to test for bi-modalities are raw data, without preprocessing. For this purpose I think it is ideal to use bimodalIndex function from ClassDiscovery package. It tests for bimodalities using the information-based BIC criterion. I know that there are more quality metrics such as boxplots, MA plots, NUSE, etc...The use of histograms is complementary to all of them and all I need is something that says that, maybe, a CEL file isn't good due to such bi-modalities, taking into account the rest of quality metrics. Thanks again, Javier On 30/03/2010 14:54, Wolfgang Huber wrote: > Dear Javier > > note that the number of modes of a distribution > - can depend on the normalisation (before or after log- transformation; or whether background correction was done and how) > - is impossible to determine from a finite sample without further assumptions (essentially a smoothing bandwidth) > > Besides these (significant) practical difficulties, I am also doubtfulof the usefulness, in terms of sensitivity and specificity, of this criterion for array quality diagnostics. If you see two modes, they would most likely be associated with a covariate, such as row, column, spatial position on the array. Then, if you find that this co- variate is quality-relevant, then I would advise checking for significant effects of that covariate even on arrays where the distribution looks uni-modal. > > Best wishes > Wolfgang > > Mar 29, 2010, alle ore 6:14 PM, Javier P?rez Florido > > >> Dear list, >> Histograms are usually used to check the quality of microarray >> experiments. If there are bi-modalities in a particular array, it is a >> candidate to exclude it from the experiment. It is easy to check >> bi-modalities or multi-modalities visually, but I would like to know if >> there is a way (using a statistical test or something) to check >> multi-modalities using the data returned by the hist function. >> >> For an Affybatch object, hist function returns the X and Y values, but >> that's all, it doesn't return the variables breaks, counts, etc as it is >> said in the help manual for hist. So, I have two questions: >> >> * Is there a test to check for multi-modalities in histograms? >> * Is there a way to know the cells and the number of values per cell >> used by hist to check for multi-modalities in a rudimentary way? >> >> Thanks again, >> Javier >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
Raw? Log-transformed? It almost certainly matters. The underlying model used by bimodalIndex (and by Mclust) is that of a mixture of normal distributions. On the raw-linear scale of most microarrays, the distributions are skewed. (In fact, the usual RMA background correction model is to view the data as a mixture of normal background with exponential noise.) I would expect that fitting a mixture-of-normals model to this data would almost always conclude that it was (at least) bimodal, with the long exponential tail representing one of the modes. Kevin Javier P?rez Florido wrote: > Dear Wolfgang, > Thanks for your reply. > The data I am going to test for bi-modalities are raw data, without > preprocessing. For this purpose I think it is ideal to use > bimodalIndex function from ClassDiscovery package. It tests for > bimodalities using the information-based BIC criterion. > I know that there are more quality metrics such as boxplots, MA plots, > NUSE, etc...The use of histograms is complementary to all of them and > all I need is something that says that, maybe, a CEL file isn't good > due to such bi-modalities, taking into account the rest of quality > metrics. > > Thanks again, > Javier > > > > On 30/03/2010 14:54, Wolfgang Huber wrote: >> Dear Javier >> >> note that the number of modes of a distribution >> - can depend on the normalisation (before or after >> log-transformation; or whether background correction was done and how) >> - is impossible to determine from a finite sample without further >> assumptions (essentially a smoothing bandwidth) >> >> Besides these (significant) practical difficulties, I am also >> doubtfulof the usefulness, in terms of sensitivity and specificity, >> of this criterion for array quality diagnostics. If you see two >> modes, they would most likely be associated with a covariate, such as >> row, column, spatial position on the array. Then, if you find that >> this co-variate is quality-relevant, then I would advise checking for >> significant effects of that covariate even on arrays where the >> distribution looks uni-modal. >> >> Best wishes >> Wolfgang >> >> Mar 29, 2010, alle ore 6:14 PM, Javier P?rez Florido >> >> >>> Dear list, >>> Histograms are usually used to check the quality of microarray >>> experiments. If there are bi-modalities in a particular array, it is a >>> candidate to exclude it from the experiment. It is easy to check >>> bi-modalities or multi-modalities visually, but I would like to know if >>> there is a way (using a statistical test or something) to check >>> multi-modalities using the data returned by the hist function. >>> >>> For an Affybatch object, hist function returns the X and Y values, but >>> that's all, it doesn't return the variables breaks, counts, etc as >>> it is >>> said in the help manual for hist. So, I have two questions: >>> >>> * Is there a test to check for multi-modalities in histograms? >>> * Is there a way to know the cells and the number of values per >>> cell >>> used by hist to check for multi-modalities in a rudimentary way? >>> >>> Thanks again, >>> Javier >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6