limma reading genpix files
2
0
Entering edit mode
@bryce-maciver-715
Last seen 10.4 years ago
hello all limma exprts, I have started using limma to analyse some spotted microarray data but have run into problems trying to read the data files in. From the limma guide, having specified the "files" , then using the command: RG<-read.maimages(files, source="genepix") I get: Read 7863scan3.gpr Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, as.is = TRUE, : more columns than column names I have 4 .gpr files that were generated from genepix version 3 with 7863scan3 being the first I also tried specifying columns as mentioned in the limma guide: RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 Median", Rb="B635 Median", Gb ="532 Median")) and got Error in "[.data.frame"(obj, , columns$Gb) : undefined columns selected. I have no idea how to interpret these error messages and have to say that my forays into BioConductor have been a frequent exercise in frustration because of constant unintelligible error messages. Could some one please help me in solving these issues. I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) thanks Bryce [[alternative HTML version deleted]]
Microarray limma Microarray limma • 1.8k views
ADD COMMENT
0
Entering edit mode
Heidi Dvinge ▴ 30
@heidi-dvinge-717
Last seen 10.4 years ago
> I also tried specifying columns as mentioned in the limma guide: > > RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 > Median", Rb="B635 Median", Gb ="532 Median")) > Don't you need Gb ="B532 Median" \Heidi > and got > > Error in "[.data.frame"(obj, , columns$Gb) : > undefined columns selected. > > I have no idea how to interpret these error messages and have to say > that my forays into BioConductor have been a frequent exercise in > frustration because of constant unintelligible error messages. Could > some one please help me in solving these issues. > > I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) > > thanks > Bryce > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
> hello all limma exprts, > > I have started using limma to analyse some spotted microarray data but > have run into problems trying to read the data files in. > From the limma guide, having specified the "files" , then using the > command: > RG<-read.maimages(files, source="genepix") > > I get: > > Read 7863scan3.gpr > Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, > as.is = TRUE, : > more columns than column names Something is wrong with your second data file. Have you looked at it to check? > I have 4 .gpr files that were generated from genepix version 3 with > 7863scan3 being the first > > I also tried specifying columns as mentioned in the limma guide: > > RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 > Median", Rb="B635 Median", Gb ="532 Median")) > > and got > > Error in "[.data.frame"(obj, , columns$Gb) : > undefined columns selected. The error message seems to me to be not impossible to interpret. It tells you that you've tried to specify a column that doesn't exist and that the offending column is Gb. If you look at your own code you'll see an obvious typo. Gordon > I have no idea how to interpret these error messages and have to say > that my forays into BioConductor have been a frequent exercise in > frustration because of constant unintelligible error messages. Could > some one please help me in solving these issues. > > I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) > > thanks > Bryce > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hello all, I've just started plowing thru some two-color arrays and ran across the same bug. The problem I found is due to an efficiency hack that doesn't always work: read.maimages assumes that all the genepix files it reads have the same number of header records. That is, rather than taking the number of header records directly from each .gpr file, it counts the number of lines until the header line in the *first file only*, and then assumes that all the rest of the files have the same number of records. If not, then the skip= parameter is incorrect, read.table() starts in the wrong place, and results are, as they say, unpredictable. The "right" way to do this is to read the first lines of each .gpr file, get the number of header records, and then use read.table with the right number of header records. A quick hack is just to search for the header in each file. I've attached a modification to read.maimages() that does exactly that, at least the three times I've tried it :-) Cheers, Dave Nelson Gordon K Smyth wrote: >>hello all limma exprts, >> >>I have started using limma to analyse some spotted microarray data but >>have run into problems trying to read the data files in. >> From the limma guide, having specified the "files" , then using the >>command: >> RG<-read.maimages(files, source="genepix") >> >>I get: >> >>Read 7863scan3.gpr >>Error in read.table(fullname, skip = skip, header = TRUE, sep = sep, >>as.is = TRUE, : >> more columns than column names > > > Something is wrong with your second data file. Have you looked at it to > check? > > >>I have 4 .gpr files that were generated from genepix version 3 with >>7863scan3 being the first >> >>I also tried specifying columns as mentioned in the limma guide: >> >>RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532 >>Median", Rb="B635 Median", Gb ="532 Median")) >> >>and got >> >>Error in "[.data.frame"(obj, , columns$Gb) : >> undefined columns selected. > > > The error message seems to me to be not impossible to interpret. It tells > you that you've tried to specify a column that doesn't exist and that the > offending column is Gb. If you look at your own code you'll see an > obvious typo. > > Gordon > > >>I have no idea how to interpret these error messages and have to say >>that my forays into BioConductor have been a frequent exercise in >>frustration because of constant unintelligible error messages. Could >>some one please help me in solving these issues. >> >>I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?) >> >>thanks >>Bryce >> [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -------------- next part -------------- ### ### A hack to read genepix .gpr files with a differ number of header lines in each file. ### ### Does not work with other formats. SHOULD rewrite to take the header count from the second ### line of the file. ### my.read.maimages <- function (files, source = "spot", path = NULL, ext = NULL, names = NULL, columns = NULL, wt.fun = NULL, verbose = TRUE, sep = "\t", quote = "\"", ...) { if (missing(files)) { if (missing(ext)) stop("Must specify input files") else { extregex <- paste("\\.", ext, "$", sep = "") files <- dir(path = ifelse(is.null(path), ".", path), pattern = extregex) files <- sub(extregex, "", files) } } if (!missing(source) && !missing(columns)) stop("Cannot specify both source and columns") source <- match.arg(source, c("arrayvision", "genepix", "imagene", "quantarray", "smd", "spot", "spot.close.open")) if (source == "imagene") return(read.imagene(files = files, path = path, ext = ext, names = names, columns = columns, wt.fun = wt.fun, verbose = verbose, sep = sep, quote = quote, ...)) slides <- as.vector(as.character(files)) if (!is.null(ext)) slides <- paste(slides, ext, sep = ".") nslides <- length(slides) if (is.null(names)) names <- removeExt(files) if (is.null(columns)) columns <- switch(source, smd = list(Gf = "CH1I_MEAN", Gb = "CH1B_MEDIAN", Rf = "CH2I_MEAN", Rb = "CH2B_MEDIAN"), spot = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR", Gb = "morphG"), spot.close.open = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR.close.open", Gb = "morphG.close.open"), genepix = list(Rf = "F635 Mean", Gf = "F532 Mean", Rb = "B635 Median", Gb = "B532 Median"), quantarray = list(Rf = "ch2 Intensity", Gf = "ch1 Intensity", Rb = "ch2 Background", Gb = "ch1 Background")) fullname <- slides[1] if (!is.null(path)) fullname <- file.path(path, fullname) if (source == "quantarray") { firstfield <- scan(fullname, what = "", sep = "\t", flush = TRUE, quiet = TRUE, blank.lines.skip = FALSE, multi.line = FALSE) skip <- grep("Begin Data", firstfield) if (length(skip) == 0) stop("Cannot find \"Begin Data\" in image output file") nspots <- grep("End Data", firstfield) - skip - 2 obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", nrows = nspots, ...) } else if (source == "arrayvision") { skip <- 1 cn <- scan(fullname, what = "", sep = sep, quote = quote, skip = 1, nlines = 1, quiet = TRUE) fg <- grep("^Median Dens - RFU", cn) if (length(fg) != 2) stop(paste("Cannot find foreground columns in", fullname)) bg <- grep("Bkgd", cn) if (length(fg) != 2) stop(paste("Cannot find background columns in", fullname)) columns <- list(Rf = fg[1], Rb = bg[1], Gf = fg[2], Gb = bg[2]) obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", ...) nspots <- nrow(obj) } else { skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1 if (length(skip) == 0) stop(paste("Cannot find column heading in image output file", fullname)) else skip <- skip[1] if (verbose) cat("Reading", fullname, "after skipping", skip, "...") obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, comment.char = "", ...) if (verbose) cat("\tDone.\n") nspots <- nrow(obj) } Y <- matrix(0, nspots, nslides) colnames(Y) <- names RG <- list(R = Y, G = Y, Rb = Y, Gb = Y) if (source == "smd") { anncol <- grep(columns$Gf, colnames(obj)) - 1 if (anncol > 0) RG$genes <- data.frame(obj[, 1:anncol]) } if (!is.null(wt.fun)) RG$weights <- Y for (i in 1:nslides) { if (i > 1) { fullname <- slides[i] if (!is.null(path)) fullname <- file.path(path, fullname) ## ## HACK HACK. Works for genepix files, but others??? ## skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1 if (length(skip) == 0) stop(paste("Cannot find column heading in image output file", fullname)) else skip <- skip[1] if (verbose) cat("Reading", fullname, "after skipping", skip, "...") obj <- read.table(fullname, skip = skip, header = TRUE, sep = sep, as.is = TRUE, quote = quote, check.names = FALSE, comment.char = "", nrows = nspots, ...) cat("\tDone.\n") } RG$R[, i] <- obj[, columns$Rf] RG$G[, i] <- obj[, columns$Gf] RG$Rb[, i] <- obj[, columns$Rb] RG$Gb[, i] <- obj[, columns$Gb] if (!is.null(wt.fun)) RG$weights[, i] <- wt.fun(obj) # if (verbose) # cat(paste("Read", fullname, "\n")) } new("RGList", RG) }
ADD REPLY
0
Entering edit mode
Dear Dave, Thanks for the diagnosis. You are quite correct that read.maimages() does assume that all the gpr files in a batch have headers of the same number of lines, and this assumption should be relaxed. As it happens, the assumption has always been true for genepix data sets that I have seen. Gordon At 03:53 AM 14/04/2004, David Nelson wrote: >Hello all, > >I've just started plowing thru some two-color arrays and ran across the >same bug. > >The problem I found is due to an efficiency hack that doesn't always work: >read.maimages assumes that all the genepix files it reads have the same >number of header records. That is, rather than taking the number of header >records directly from each .gpr file, it counts the number of lines until >the header line in the *first file only*, and then assumes that all the >rest of the files have the same number of records. > >If not, then the skip= parameter is incorrect, read.table() starts in the >wrong place, and results are, as they say, unpredictable. > >The "right" way to do this is to read the first lines of each .gpr file, >get the number of header records, and then use read.table with the right >number of header records. A quick hack is just to search for the header in >each file. > >I've attached a modification to read.maimages() that does exactly that, at >least the three times I've tried it :-) > > >Cheers, > >Dave Nelson
ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6