clustering methods for gene expression
0
0
Entering edit mode
Donna Toleno ▴ 90
@donna-toleno-2383
Last seen 10.2 years ago
Hello list, I have a question about using the Bioconductor tools to reproduce a certain illustration I have seen in publications. I took some sample data and used hcluster on my microarray arrays as in the following example. I subset on the genes and then cluster the arrays based on the genes of interest. I use Euclidean distance and average linking. Is this the same as using PHYLIP to make a UPGMA tree with distances calculated based on pairwise "mean of the squared difference in expression intensities"? If I am not mistaken, the Euclidean distance would be the square root of the sum of the squared differences. It seems it is a similar distance measure. But what if we took the mean probe signal intensities for each of two previously defined groups and then took a mean squared difference between the group means? Then within one of the groups the expression variation is defined in the literature as "mean probe signal intensity variation". These seem like unusual ways to measure the with in and between group variation. Wouldn't it be more acceptable to use a linear model in limma to quantify the variance components within and between groups?Also, is there any reason to use PHYLIP to display the relationships of expression profiles instead of using hcluster or some other clustering algorithm? Any feedback at all would be appreciated. Feel free to criticize my R code appearing below. I am somewhat new to microarray data analysis and Bioconductor and I recently pieced together this code to carry out clustering. mydata <- ReadAffy() eset <- rma(mydata) #transpose the expression set so that the rows are the genes data_frame_expression <- t(as.data.frame(eset)) vars <- apply(data_frame_expression,1, var) means <- apply(data_frame_expression,1, mean) stdevs <- apply(data_frame_expression,1, sd) CV <- stdevs/means #subset the data on the rows (genes) # an example:subset <- vars > quantile(vars,(nrow(data_frame_expression -200)/nrow(data_frame_expression)) subset <- CV > 0.15 expression_subset <- data_frame_expression[subset, ] #cluster on the columns (samples) hc <- hcluster(t(expression_subset), link = "ave") plot (hc) write(hc2Newick(hc),file='hclust_subset.newick') subset_distances <- as.matrix(dist(t(expression_subset), method = "euclidean", diag = FALSE, upper = FALSE)) write (subset_distances, file='subset_of_genes_dist_file') # the problem here is that the distance matrix didn't print to a file properly. Thank you for reading!
Microarray Clustering probe limma Microarray Clustering probe limma • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6