get over it/WAKE uP and SMELL the COFFEE

0

Entering edit mode

Stephen Henderson ★ 1.0k

@stephen-henderson-71

Last seen 7.6 years ago

I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST multiVARIATE methods are BUILt on top OF the marginal tests. FOR instance machine learning methods are based on gene subsets for each of k CROSS validations. USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for subset selection is IMHO the most IMPORTANT!! choice . Stephen ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}

• 1.3k views

ADD COMMENT • link updated 21.0 years ago by A.J. Rossini ▴ 810 • written 21.0 years ago by Stephen Henderson ★ 1.0k

0

Entering edit mode

Ann Loraine ▴ 30

@ann-loraine-576

Last seen 10.2 years ago

My apologies if this is off-topic. If you like, feel free to respond directly rather than posting to the list. Affymetrix made this announcement recently: http://www.corporate-ir.net/ireye/ir_site.zhtml? ticker=AFFX&script=410&layout=-6&item_id=476496 In a nutshell, they are announcing that they want to develop target preparation protocols that amplify the full length of target mRNAs with the ultimate goal of identifying splice variants. Many splice variants (most notably variants of genes involved in apoptosis) produced by the same gene exhibit different, sometimes even antagonistic biological functions. An expression array that could reliably produce information about how splicing varies from cell type to cell type would certainly be useful. Chad argues that focusing on individual genes using micro-array data is problematic. Would a chip designed to measure the relative abundance of individual mRNAs produced by a single gene merely magnify these problems? Are the statistics of microarray expression analysis 'strong' enough to allow analysis of individual mRNAs from the same gene? I would be very interested to learn what members of this list think about this development. Sincerely, Ann Loraine On Dec 18, 2003, at 6:09 AM, Stephen Henderson wrote: > I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST > multiVARIATE methods are BUILt on top OF the marginal tests. FOR > instance > machine learning methods are based on gene subsets for each of k CROSS > validations. USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for > subset > selection is IMHO the most IMPORTANT!! choice . > > > Stephen > > > ********************************************************************** > This email and any files transmitted with it are > confidentia...{{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.0 years ago Ann Loraine ▴ 30

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.6 years ago

United States

Currently I am telling the biologists to consider microarrays as screening experiments. Mostly, they use the results for second stage analyses, which may be: e.g. statistical analyses such as clustering etc bioinformatics analyses such as GO, BLAST or sequence analyses lab analyses such as Northern blots, in situs, etc Given the huge number of genes on most arrays, I do want a reasonably reliable method of screening. On the other hand, I sometimes just rank the genes by test score, rather than attempt to determine some suitable alpha-level, FDR or FNR. Incidentally, distinguishing between technical replicates and biological replicates can make a huge different to ANOVA test scores, so I think we should insist that our analyses should account for this. --Naomi At 09:09 AM 12/18/2003, Stephen Henderson wrote: >I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST >multiVARIATE methods are BUILt on top OF the marginal tests. FOR instance >machine learning methods are based on gene subsets for each of k CROSS >validations. USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for subset >selection is IMHO the most IMPORTANT!! choice . > > >Stephen > > >********************************************************************* * >This email and any files transmitted with it are confidentia...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 21.0 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.6 years ago

United States

One thing that makes me very cautious about over-interpreting tests, however, is the following: We have tried several options for normalizing arrays, and found that the resulting expression values (for the methods we used) were correlated 98-99%. But if we then test for differential expression, we find the overlap in the list of "top genes" is only 50-60%. --Naomi At 09:09 AM 12/18/2003, Stephen Henderson wrote: >I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST >multiVARIATE methods are BUILt on top OF the marginal tests. FOR instance >machine learning methods are based on gene subsets for each of k CROSS >validations. USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for subset >selection is IMHO the most IMPORTANT!! choice . > > >Stephen > > >********************************************************************* * >This email and any files transmitted with it are confidentia...{{dropped}} > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 21.0 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

this is likely to be a consequence of the probe effect. consider that the log scale correlation of two affy reps is higher than .99 (with rma). however, if for each gene you define two probesets at random (half the probes to one probeset and half to the other) and recompute rma, the correlation between "half probe sets" for the same genes (within an array) drops to around 0.5. this is consistent with your finds. the log ratio cancels out the probe effect so tests based on these will have smaller correlations than 0.99. this goes to show that correlations of log expression values is not useful as a measure of agreement. a much better measure is the spread (iqr, sd, ...) of the log ratios. similarly, scatterplots arent as useful as MA plots. On Thu, 18 Dec 2003, Naomi Altman wrote: > One thing that makes me very cautious about over-interpreting tests, > however, is the following: > > We have tried several options for normalizing arrays, and found that the > resulting expression values (for the methods we used) were correlated > 98-99%. But if we then test for differential expression, we find the > overlap in the list of "top genes" is only 50-60%. > > --Naomi > > > At 09:09 AM 12/18/2003, Stephen Henderson wrote: > >I agree with some of WHAT you say CHAD, the PROBLEM is THAT MOST > >multiVARIATE methods are BUILt on top OF the marginal tests. FOR instance > >machine learning methods are based on gene subsets for each of k CROSS > >validations. USE of the appropriate TEST (fold/T/F/cyber-T/etc..)for subset > >selection is IMHO the most IMPORTANT!! choice . > > > > > >Stephen > > > > > >******************************************************************* *** > >This email and any files transmitted with it are confidentia...{{dropped}} > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 21.0 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

A.J. Rossini ▴ 810

@aj-rossini-209

Last seen 10.2 years ago

Naomi Altman <naomi@stat.psu.edu> writes: > Currently I am telling the biologists to consider microarrays as > screening experiments. Mostly, they use the results for second stage > analyses, which may be: > > e.g. statistical analyses such as clustering etc > bioinformatics analyses such as GO, BLAST or sequence analyses > lab analyses such as Northern blots, in situs, etc > > Given the huge number of genes on most arrays, I do want a reasonably > reliable method of screening. On the other hand, I sometimes just > rank the genes by test score, rather than attempt to determine some > suitable alpha-level, FDR or FNR. > > Incidentally, distinguishing between technical replicates and > biological replicates can make a huge different to ANOVA test scores, > so I think we should insist that our analyses should account for this. Slightly off topic, but one thing that I'm beginning to suspect is that regardless of the statistical evidence, there are probably publishable and plausible stories for approx 20-40% of the genes in any experiment; the goal is to insure more (weak but useful) screening evidence for finding those genes, not that the experiment can strongly support the results, regardless. It's the story-telling and plausibility (which are related to the rationale behind the experiment) which drive the results, and that isn't (and never really was) primarily a statistical issue, but related to the scientific issues surrounding the experiment we are working with, hence the reason to incorporate and use metadata as primary (and I'm happy that Robert and others have been pushing this issue a good deal in the design of Bioconductor tools). best, -tony -- rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

ADD COMMENT • link 21.0 years ago A.J. Rossini ▴ 810

Login before adding your answer.