Hierarchical clustering of RMA data
1
0
Entering edit mode
@ryan-kirkbride-3006
Last seen 10.2 years ago
Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]]
Normalization Clustering Normalization Clustering • 1.2k views
ADD COMMENT
0
Entering edit mode
@deanne-taylor-2380
Last seen 10.2 years ago
Ryan: This might be a naive question as I'm not sure how dChip is doing the normalization, but is there a setting in dChip to let it know it's a log2 scale? Otherwise the mathematics between log and linear scale would be much different... and that might be the source of the difference, as subtracting log2 data is akin to dividing at the linear scale. --- Deanne Taylor PhD Executive Director, Bioinformatics Core Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor at hsph.harvard.edu >>> Ryan Kirkbride <rkirkbride at="" ucdavis.edu=""> 08/28/08 8:27 PM >>> Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
RESENDING WITHOUT ATTACHMENT: Scaling is well known to cause different hierarchical (and non- hierarchical) clustering results.  The decision to transform the data has to be considered in terms of how the transformation will impact the distance calculations.  We are very comfortable with transforming to induce things such as normality or homoscedasticty, however, this is not why we would necessarily do it in a clustering problem. I have attached a review article (in a previous post) on clustering microarray data that shows a simple example of how scaling results in different clusters, and why one would be used over the other. (Pharmacogenomics, 2003, Vol 4(1), pps. 41-52). Bill Shannon Associate Professor of Biostatistics in Medicine Washington University School of Medicine St Louis  President-Elect, Classification Society --- On Fri, 8/29/08, Deanne Taylor <dtaylor@hsph.harvard.edu> wrote: From: Deanne Taylor <dtaylor@hsph.harvard.edu> Subject: Re: [BioC] Hierarchical clustering of RMA data To: bioconductor@stat.math.ethz.ch, rkirkbride@ucdavis.edu Date: Friday, August 29, 2008, 6:35 AM Ryan: This might be a naive question as I'm not sure how dChip is doing the normalization, but is there a setting in dChip to let it know it's a log2 scale? Otherwise the mathematics between log and linear scale would be much different... and that might be the source of the difference, as subtracting log2 data is akin to dividing at the linear scale. --- Deanne Taylor PhD Executive Director, Bioinformatics Core Department of Biostatistics Harvard School of Public Health 655 Huntington Avenue Boston, MA 02115 dtaylor@hsph.harvard.edu >>> Ryan Kirkbride <rkirkbride@ucdavis.edu> 08/28/08 8:27 PM >>> Hello all! I have a basic conceptual question: I have a set of RMA normalized data that I am looking to carry out hierarchical clustering. In the past we've usually been working with MAS5 data which we import into dCHIP to carry out the clustering. I'm now looking to do the same with RMA data, and I'm wondering if I should transform to a linear scale or leave it the typical log2 scale. dChip does a per gene normalization (subtracts the mean and then divides by the standard deviation), and it appears that linear or log2 scale affects the results. I'm assuming most people just leave it log2 scale, am I overthinking the whole issue? Thanks, _________________________ Ryan Kirkbride Plant Biology Graduate Student Harada Lab UC Davis [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6