[Fwd: batch info for cellHTS]
0
0
Entering edit mode
Florian Hahne ▴ 540
@florian-hahne-2471
Last seen 10.2 years ago
Hi Yan, the idea was that the user constructs the batch array manually and assigns it to the batch slot using the accessor method, e.g. bt <- array(rep(1:2, each=5, dim=c(5,2,1)) batch(x) <- bt I agree that it would be useful to have that functionality directly in the import functions. The only natural place where the batch information could go is in the platelist file. In the plateconf file, we don't have the notion of plate replicates or samples any more. Attached you find a slightly modified readPlateList function which evaluates an (optional) "Batch" column in the platelist file. Hope that is the functionality you where looking for. Please let me know if you have further input. Thanks, Florian PS: I forwarded this conversation to the mailing list. Others might benefit from it... readPlateList <- function(filename, path=dirname(filename), name, importFun, verbose=interactive()) { file <- basename(filename) dfiles <- dir(path) if(!(is.character(path)&&length(path)==1)) stop("'path' must be character of length 1") pd <- read.table(file.path(path, file), sep="\t", header=TRUE, as.is=TRUE) checkColumns(pd, file, mandatory=c("Filename", "Plate", "Replicate"), numeric=c("Plate", "Replicate", "Channel", "Batch")) ## consistency check for "importFun" if (!missing(importFun)) { if (!is(importFun, "function")) stop("'importFun' should be a function to use to read the raw data files") } else { ## default function (compatible with the file format of the plate reader) importFun <- function(f) { txt <- readLines(f, warn=FALSE) sp <- strsplit(txt, "\t") well <- sapply(sp, "[", 2) val <- sapply(sp, "[", 3) out <- list(data.frame(well=I(well), val=as.numeric(val)), txt=I(txt)) return(out) } } ## check if the data files are in the given directory a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles, ignore.case=TRUE))) if (length(a)==0) stop(sprintf("None of the files were found in the given 'path': %s", path)) f <- file.path(path, dfiles[a]) ## check if 'importFun' gives the output in the desired form aux <- importFun(f[1]) if (which(unlist(lapply(aux, is, "data.frame"))) != 1 | !all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2) stop("The output of 'importFun' must be a list with 2 components;\n", "the first component should be a 'data.frame' with slots 'well' and 'val'.") ## auto-determine the plate format well <- as.character(importFun(f[1])[[1]]$well) let <- substr(well, 1, 1) num <- substr(well, 2, 3) let <- match(let, LETTERS) num <- as.integer(num) ifanyis.na(let))||anyis.na(num))) stop(sprintf("Malformated column 'well' in input file %s", f[1])) dimPlate <- c(nrow=max(let), ncol=max(num)) nrWell <- prod(dimPlate) if(verbose) cat(sprintf("%s: found data in %d x %d (%d well) format.\n", name, dimPlate[1], dimPlate[2], nrWell)) ## Should we check whether these are true? ## "96" = c(nrow=8, ncol=12), ## "384" = c(nrow=16, ncol=24), nrRep <- max(pd$Replicate) nrPlate <- max(pd$Plate) combo <- paste(pd$Plate, pd$Replicate) ## Channel: if not given, this implies that there is just one if("Channel" %in% colnames(pd)) { nrChannel <- max(pd$Channel) channel <- pd$Channel combo <- paste(combo, pd$Channel) } else { nrChannel <- 1L channel <- rep(1L, nrow(pd)) pd$Channel <- channel } whDup <- which(duplicated(combo)) if(length(whDup)>0L) { idx <- whDup[1:min(5L, length(whDup))] msg <- paste("The following rows are duplicated in the plateList table:\n", "\tPlate Replicate Channel\n", "\t", paste(idx, combo[idx], sep="\t", collapse="\n\t"), if(length(whDup)>5) sprintf("\n\t...and %d more.\n", length(whDup)-5), "\n", sep="") stop(msg) } xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel)) intensityFiles <- vector(mode="list", length=nrow(pd)) names(intensityFiles) <- pd[, "Filename"] status <- character(nrow(pd)) for(i in seq_len(nrow(pd))) { if(verbose) cat("\rReading ", i, ": ", pd$Filename[i], sep="") ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE) if (length(ff)!=1) { f <- file.path(path, pd[i, "Filename"]) status[i] <- sprintf("File not found: %s", f) } else { f <- file.path(path, dfiles[ff]) names(intensityFiles)[i] <- dfiles[ff] status[i] <- tryCatch({ out <- importFun(f) pos <- convertWellCoordinates(out[[1]]$well, dimPlate)$num intensityFiles[[i]] <- out[[2]] xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <- out[[1]]$val "OK" }, warning=function(e) paste(class(e)[1], e$message, sep=": "), error=function(e) paste(class(e)[1], e$message, sep=": ") ) ## tryCatch } ## else } ## for if(verbose) cat("\rRead", nrow(pd), "plates. \n\n") ## ---- Store the data as a "cellHTS" object ---- ## arrange the assayData slot: dat <- lapply(seq_len(nrChannel), function(ch) matrix(xraw[,,,ch], ncol=nrRep, nrow=nrWell*nrPlate)) names(dat) <- paste("ch", seq_len(nrChannel), sep="") adata <- do.call("assayDataNew", c(storage.mode="lockedEnvironment", dat)) ## arrange the phenoData slot: pdata <- new("AnnotatedDataFrame", data <- data.frame(replicate=seq_len(nrRep), assay=rep(name, nrRep), stringsAsFactors=FALSE), varMetadata=data.frame(labelDescription=c("Replicate number", "Biological assay"), channel=factor(rep("_ALL_", 2L), levels=c(names(dat), "_ALL_")), row.names=c("replicate", "assay"), stringsAsFactors=FALSE)) ## arrange the featureData slot: well <- convertWellCoordinates(seq_len(nrWell), pdim=dimPlate)$letnum fdata <- new("AnnotatedDataFrame", data <- data.frame(plate=rep(seq_len(nrPlate), each=nrWell), well=rep(well, nrPlate), controlStatus=factor(rep("unknown", nrWell*nrPlate)), stringsAsFactors=FALSE), varMetadata=data.frame(labelDescription=c("Plate number", "Well ID", "Well annotation"), row.names=c("plate", "well", "controlStatus"), stringsAsFactors=FALSE)) res <- new("cellHTS", assayData=adata, phenoData=pdata, featureData=fdata, plateList=cbind(pd[,1L,drop=FALSE], status=I(status), pd[,-1L,drop=FALSE]), intensityFiles=intensityFiles) ## if there is a batch column in the platelist file we want to import it if("Batch" %in% colnames(pd)){ bat <- pd$Batch[order(pd$Replicate, pd$Channel)] dim(bat) <- c(max(plate(res)), ncol(res), length(channelNames(res))) res at batch <- bat } ## output the possible errors that were encountered along the way: whHadProbs <- which(status!="OK") if(length(whHadProbs)>0 & verbose) { idx <- whHadProbs[1:min(5, length(whHadProbs))] msg <- paste("Please check the following problems encountered while reading the data:\n", "\tFilename \t Error\n", "\t", paste(plateList(res)$Filename[idx], status[idx], sep="\t", collapse="\n\t"), if(length(whHadProbs)>5) sprintf("\n\t...and %d more.\n", length(whHadProbs)-5), "\n", sep="") stop(msg) } return(res) } Yan Zhou wrote: > Dear Florian, > > I understand the different meaning of batch from the 2 cellHTS > packages. I just don't know how to add the "batch" slot.(we have a > large screen with screen done on different day,we want to use the > variance adjustment by batch function). I tried to add a "batch" > column in the plate configuration file, but doesn't seem to be taken > into cellHTS2 object. then I tried to add "batch" column in the plate > list file , still didn't do anything. In another word, I couldn't > figure out how to add the batch slot to the cellHTS object. please > give more detail. Thank you so much! > > yan > > Florian Hahne wrote: > >> Hi Yan, >> Ligia forwarded your mail to me. >> >> The batch concept is a little bit different in cellHTS2. Basically, >> we separated the ability to included several plate configurations and >> the batch-specific parameter estimation (e.g. in the normalizePlate >> function). For the former, you can use a regular expression syntax in >> your plate configuration file, while the latter has to be added >> manually into the new 'batch' slot. All of this is explained in much >> more detail in the package vignette: cellHTS2 - Main vignette >> (complete version): End-to-end analysis of cell-based screens >> >> Let me know if this doesn't get you anywhere. >> >> Florian >> >> ligia at ebi.ac.uk wrote: >> >>> Hi Florian, >>> >>> Hope you're doing fine. >>> Could you take care of this for me? >>> Cheers, >>> Ligia >>> >>> >>> ------------------------- Original Message ---------------------------- >>> Subject: batch info for cellHTS >>> From: "Yan Zhou" <yan.zhou at="" fccc.edu=""> >>> Date: Fri, October 17, 2008 19:14 >>> To: ligia at ebi.ac.uk >>> ------------------------------------------------------------------ -------- >>> >>> >>> Dear Ligia, >>> >>> I'm using the cellHTS2 package for HTS data analysis. For the old >>> cellHTS package, I knew how to incorporate the batch information. But >>> for the new cellHTS2 package, I couldn't have it done right. I attached >>> my plate configuration file and also plate list file with this email. I >>> was wondering whether you would have time to help me take a look, and >>> point me to the right directions. Thanks a lot for your time and >>> kind help. >>> >>> yan >>> >> >> >> > -- Florian Hahne, PhD Computational Biology Program Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-3148 fhahne at fhcrc.org
GO Cancer cellHTS cellHTS2 GO Cancer cellHTS cellHTS2 • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6