Importing .cdt files generated by Cluster3.0 into R
Entering edit mode
Guest User ★ 13k
Last seen 10.1 years ago
Dear All, I have used the Cluster3.0 to generate cluster for gene expression data. I would like to import these files (.cdt,.gtr) into R. to generate silohuette plots. Basically, I would like to check for the robustness of the clusters. Could the files from Cluster3.0 be imported into R? I would appreciate any other suggestions. Thanks. -- output of sessionInfo(): none -- Sent via the guest posting facility at
Entering edit mode
Last seen 2.7 years ago
United States
Hello, For importing these files into R see ?read.table and ?read.delim. Reading the gtr should be fairly straightforward with read.table. The cdt file might be better with read.delim and set fill=TRUE. Valerie On 10/18/2011 07:58 AM, Sohail [guest] wrote: > Dear All, > > I have used the Cluster3.0 to generate cluster for gene expression data. I would like to import these files (.cdt,.gtr) into R. to generate silohuette plots. > Basically, I would like to check for the robustness of the clusters. Could the files from Cluster3.0 be imported into R? I would appreciate any other suggestions. > Thanks. > > -- output of sessionInfo(): > > none > > -- > Sent via the guest posting facility at > > _______________________________________________ > Bioconductor mailing list > Bioconductor at > > Search the archives:
Entering edit mode
readCDT <- function(filename) { fname <- sub('.cdt$', '', filename) # get rid of the extension atr <- read.table(paste(fname, 'atr', sep='.'), sep='\t', header=FALSE, gtr <- read.table(paste(fname, 'gtr', sep='.'), sep='\t', header=FALSE, cdt <- read.table(paste(fname, 'cdt', sep='.'), sep='\t', header=TRUE, row.names=NULL) # we only need the first column of the CDT file, which contains # the order information, and the third column, which contains the # labels. The rest of the file contains the data matrix. rown <- as.character(cdt[,"GID"]) coln <- colnames(cdt) firstRow <- 1 + which(rown=="EWEIGHT") firstCol <- 1 + which(coln=="GWEIGHT") gid <- as.character(cdt[,"GID"])[firstRow:nrow(cdt)] aid <- cdt[rown=="AID",][firstCol:ncol(cdt)] aid <- as.character(as.matrix(aid)) # names all start with 'GENE' or 'NODE' (or 'ARRY') and end with 'X' gene.order <- 1 + as.numeric(substring(gid, 5, nchar(gid)-1)) arry.order <- 1 + as.numeric(substring(aid, 5, nchar(aid)-1)) # Because Cluster reorders things and because hclust and plclust wants # to do the same, we have to reinvert the ordering during passage from # one to the other gene.labels <- as.character(cdt$NAME)[firstRow:nrow(cdt)][order(gene.order)] arry.labels <- coln[firstCol:ncol(cdt)][order(arry.order)] temp <- as.matrix(cdt[firstRow:nrow(cdt), firstCol:ncol(cdt)]) temp <- temp[order(gene.order), order(arry.order)] data <- matrix(as.numeric(temp), ncol=ncol(temp)) dimnames(data) <- list(gene.labels, arry.labels) # The gtr file contains the "distances" in column 4. Actually, # Eisen's Cluster program reports similarities instead of # distances. This fix assumes that some kind of correlation was # the meaure of similarity.... gene.height <- 1 - gtr$V4 arry.height <- 1 - atr$V4 # Columns 2 and 3 describe the two branches below each node. # Nodes are listed from bottom to top since clustering is # agglomerative. # foo <- function(alt) { # Again, we get the numeric part of the label base <- as.numeric(substring(alt, 5, nchar(alt)-1)) # We also need to know whether the label is a "GENE" or a "NODE". # The 'hclust' objects use negative integers to indicate nodes. type1 <- rep(1, length(base)) type1[substring(alt, 1, 4) %in% c('GENE', "ARRY")] <- -1 base <- base*type1 # make nodes negative adder <- (type1-1)/2 # offset the negatives to change from starting # at 1 to starting at 0. base + adder } gene.merge1 <- foo(gtr$V3) gene.merge2 <- foo(gtr$V2) arry.merge1 <- foo(atr$V3) arry.merge2 <- foo(atr$V2) # put everything together into a list and make it an hclust object gene <- list(merge=as.matrix(cbind(gene.merge1, gene.merge2)), height=gene.height, order=gene.order, labels=gene.labels, method='modified centroid', call=NULL, dist.method='Pearson correlation') class(gene) <- 'hclust' arry <- list(merge=as.matrix(cbind(arry.merge1, arry.merge2)), height=arry.height, order=arry.order, labels=arry.labels, method='modified centroid', call=NULL, dist.method='Pearson correlation') class(arry) <- 'hclust' list(gene=gene,arry=arry, data=data) } if(0) { library(ClassDiscovery) filename <- "eacdata2.cdt" cdt <- readCDT(filename) g <- cdt$gene plclust(g) d <- cdt$data image(d, col=rg) a <- cdt$arry classes3 <- cutree(a, k=3) colset <- c('red', 'orange', 'magenta') plotColoredClusters(a, lab=a$labels, col=colset[classes3]) }

Login before adding your answer.

Traffic: 511 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6