Separate Normalizations and expression plotting
1
0
Entering edit mode
Lana Schaffer ★ 1.3k
@lana-schaffer-1056
Last seen 10.3 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061005/ 985bca96/attachment.pl
• 379 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Thursday 05 October 2006 21:14, Lana Schaffer wrote: > Hi, > This experiment involves the expression analysis between 2 batches of > AGE variable samples which were normalized separately because the > batches did not cluster together. The age groups of the second set > were in between the ages from the first set and all the data are desired > to be analyzed together. Now what happened is that the separately > normalized expression values became graphed together by the lab researchers > and plotted expression values vs age. Now with the diseased samples there > were genes which showed a "trend" with age where the R-squared were between > .16 and .4. From my training I get that this trend only is explains 16-40% > of the data and would not be significant. However, using Prism these > R-squares are called significantly different from zero. These researchers > explain to me that this is the way data is presented in their field and > that an R-squared of .16-.4 is considered excellent results. Indeed, with > non-diseased individuals the R-squared are zero for these genes. I > understand that in their field "any" trend is better than no trend, > especially since the samples are hetergeneous. However, this is not what > is taught in statistics. These graphs will be submitted to Journals under > my authorship and I am a bit shaken-up. > Would you please comment to me about your thoughs about the combination > of the 2 sets of expression values and the significance of the R-squared > values. Thanks, > Lana Lana, There are two issues it seems. The first is normalization of two separate batches, which is appropriate. What isn't clear is whether doing so introduces bias in downstream analyses--this you will need to judge for yourself. The second is of the significance of the results of the R-squared value. To convince yourself and your collaborators of the significance or lack thereof of the computed values, one can test whether the R-squared is significant or not. I would suggest using a permutation-based analysis, but the method is up to you. Judging an R-squared value by looking at the raw number is probably not a valid method for determining significance. Sean
ADD COMMENT
0
Entering edit mode
It is important to remember that statistical significance refers to the investigator's ability to reproduce the result. A result can be statistically significant without having biological significance. The reported R-sq might be statistically significant but biologically insignificant. --Naomi At 06:57 AM 10/6/2006, Sean Davis wrote: >On Thursday 05 October 2006 21:14, Lana Schaffer wrote: > > Hi, > > This experiment involves the expression analysis between 2 batches of > > AGE variable samples which were normalized separately because the > > batches did not cluster together. The age groups of the second set > > were in between the ages from the first set and all the data are desired > > to be analyzed together. Now what happened is that the separately > > normalized expression values became graphed together by the lab researchers > > and plotted expression values vs age. Now with the diseased samples there > > were genes which showed a "trend" with age where the R-squared were between > > .16 and .4. From my training I get that this trend only is explains 16-40% > > of the data and would not be significant. However, using Prism these > > R-squares are called significantly different from zero. These researchers > > explain to me that this is the way data is presented in their field and > > that an R-squared of .16-.4 is considered excellent results. Indeed, with > > non-diseased individuals the R-squared are zero for these genes. I > > understand that in their field "any" trend is better than no trend, > > especially since the samples are hetergeneous. However, this is not what > > is taught in statistics. These graphs will be submitted to Journals under > > my authorship and I am a bit shaken-up. > > Would you please comment to me about your thoughs about the combination > > of the 2 sets of expression values and the significance of the R-squared > > values. Thanks, > > Lana > >Lana, > >There are two issues it seems. The first is normalization of two separate >batches, which is appropriate. What isn't clear is whether doing so >introduces bias in downstream analyses--this you will need to judge for >yourself. > >The second is of the significance of the results of the R-squared value. To >convince yourself and your collaborators of the significance or lack thereof >of the computed values, one can test whether the R-squared is significant or >not. I would suggest using a permutation-based analysis, but the method is >up to you. Judging an R-squared value by looking at the raw number is >probably not a valid method for determining significance. > >Sean > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD REPLY

Login before adding your answer.

Traffic: 853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6