hello all limma exprts,
I have started using limma to analyse some spotted microarray data
but have run into problems trying to read the data files in.
From the limma guide, having specified the "files" , then using the
command:
RG<-read.maimages(files, source="genepix")
I get:
Read 7863scan3.gpr
Error in read.table(fullname, skip = skip, header = TRUE, sep = sep,
as.is = TRUE, :
more columns than column names
I have 4 .gpr files that were generated from genepix version 3 with
7863scan3 being the first
I also tried specifying columns as mentioned in the limma guide:
RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532
Median", Rb="B635 Median", Gb ="532 Median"))
and got
Error in "[.data.frame"(obj, , columns$Gb) :
undefined columns selected.
I have no idea how to interpret these error messages and have to say
that my forays into BioConductor have been a frequent exercise in
frustration because of constant unintelligible error messages. Could
some one please help me in solving these issues.
I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?)
thanks
Bryce
[[alternative HTML version deleted]]
> I also tried specifying columns as mentioned in the limma guide:
>
> RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532
> Median", Rb="B635 Median", Gb ="532 Median"))
>
Don't you need Gb ="B532 Median"
\Heidi
> and got
>
> Error in "[.data.frame"(obj, , columns$Gb) :
> undefined columns selected.
>
> I have no idea how to interpret these error messages and have to say
> that my forays into BioConductor have been a frequent exercise in
> frustration because of constant unintelligible error messages. Could
> some one please help me in solving these issues.
>
> I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?)
>
> thanks
> Bryce
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> hello all limma exprts,
>
> I have started using limma to analyse some spotted microarray data
but
> have run into problems trying to read the data files in.
> From the limma guide, having specified the "files" , then using
the
> command:
> RG<-read.maimages(files, source="genepix")
>
> I get:
>
> Read 7863scan3.gpr
> Error in read.table(fullname, skip = skip, header = TRUE, sep = sep,
> as.is = TRUE, :
> more columns than column names
Something is wrong with your second data file. Have you looked at it
to
check?
> I have 4 .gpr files that were generated from genepix version 3 with
> 7863scan3 being the first
>
> I also tried specifying columns as mentioned in the limma guide:
>
> RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532
> Median", Rb="B635 Median", Gb ="532 Median"))
>
> and got
>
> Error in "[.data.frame"(obj, , columns$Gb) :
> undefined columns selected.
The error message seems to me to be not impossible to interpret. It
tells
you that you've tried to specify a column that doesn't exist and that
the
offending column is Gb. If you look at your own code you'll see an
obvious typo.
Gordon
> I have no idea how to interpret these error messages and have to say
> that my forays into BioConductor have been a frequent exercise in
> frustration because of constant unintelligible error messages. Could
> some one please help me in solving these issues.
>
> I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?)
>
> thanks
> Bryce
> [[alternative HTML version deleted]]
Hello all,
I've just started plowing thru some two-color arrays and ran across
the
same bug.
The problem I found is due to an efficiency hack that doesn't always
work: read.maimages assumes that all the genepix files it reads have
the
same number of header records. That is, rather than taking the number
of
header records directly from each .gpr file, it counts the number of
lines until the header line in the *first file only*, and then assumes
that all the rest of the files have the same number of records.
If not, then the skip= parameter is incorrect, read.table() starts in
the wrong place, and results are, as they say, unpredictable.
The "right" way to do this is to read the first lines of each .gpr
file,
get the number of header records, and then use read.table with the
right
number of header records. A quick hack is just to search for the
header
in each file.
I've attached a modification to read.maimages() that does exactly
that,
at least the three times I've tried it :-)
Cheers,
Dave Nelson
Gordon K Smyth wrote:
>>hello all limma exprts,
>>
>>I have started using limma to analyse some spotted microarray data
but
>>have run into problems trying to read the data files in.
>> From the limma guide, having specified the "files" , then using
the
>>command:
>> RG<-read.maimages(files, source="genepix")
>>
>>I get:
>>
>>Read 7863scan3.gpr
>>Error in read.table(fullname, skip = skip, header = TRUE, sep = sep,
>>as.is = TRUE, :
>> more columns than column names
>
>
> Something is wrong with your second data file. Have you looked at
it to
> check?
>
>
>>I have 4 .gpr files that were generated from genepix version 3 with
>>7863scan3 being the first
>>
>>I also tried specifying columns as mentioned in the limma guide:
>>
>>RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532
>>Median", Rb="B635 Median", Gb ="532 Median"))
>>
>>and got
>>
>>Error in "[.data.frame"(obj, , columns$Gb) :
>> undefined columns selected.
>
>
> The error message seems to me to be not impossible to interpret. It
tells
> you that you've tried to specify a column that doesn't exist and
that the
> offending column is Gb. If you look at your own code you'll see an
> obvious typo.
>
> Gordon
>
>
>>I have no idea how to interpret these error messages and have to say
>>that my forays into BioConductor have been a frequent exercise in
>>frustration because of constant unintelligible error messages. Could
>>some one please help me in solving these issues.
>>
>>I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?)
>>
>>thanks
>>Bryce
>> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
-------------- next part --------------
###
### A hack to read genepix .gpr files with a differ number of header
lines in each file.
###
### Does not work with other formats. SHOULD rewrite to take the
header count from the second
### line of the file.
###
my.read.maimages <- function (files, source = "spot", path = NULL, ext
= NULL, names = NULL,
columns = NULL, wt.fun = NULL, verbose =
TRUE, sep = "\t",
quote = "\"", ...)
{
if (missing(files)) {
if (missing(ext))
stop("Must specify input files")
else {
extregex <- paste("\\.", ext, "$", sep = "")
files <- dir(path = ifelse(is.null(path), ".", path),
pattern = extregex)
files <- sub(extregex, "", files)
}
}
if (!missing(source) && !missing(columns))
stop("Cannot specify both source and columns")
source <- match.arg(source, c("arrayvision", "genepix", "imagene",
"quantarray", "smd", "spot", "spot.close.open"))
if (source == "imagene")
return(read.imagene(files = files, path = path, ext = ext,
names = names, columns = columns, wt.fun = wt.fun,
verbose = verbose, sep = sep, quote = quote, ...))
slides <- as.vector(as.character(files))
if (!is.null(ext))
slides <- paste(slides, ext, sep = ".")
nslides <- length(slides)
if (is.null(names))
names <- removeExt(files)
if (is.null(columns))
columns <- switch(source, smd = list(Gf = "CH1I_MEAN",
Gb = "CH1B_MEDIAN", Rf = "CH2I_MEAN", Rb = "CH2B_MEDIAN"),
spot = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR",
Gb = "morphG"), spot.close.open = list(Rf = "Rmean",
Gf = "Gmean", Rb = "morphR.close.open", Gb =
"morphG.close.open"),
genepix = list(Rf = "F635 Mean", Gf = "F532 Mean",
Rb = "B635 Median", Gb = "B532 Median"), quantarray =
list(Rf = "ch2 Intensity",
Gf = "ch1 Intensity", Rb = "ch2 Background",
Gb = "ch1 Background"))
fullname <- slides[1]
if (!is.null(path))
fullname <- file.path(path, fullname)
if (source == "quantarray") {
firstfield <- scan(fullname, what = "", sep = "\t", flush =
TRUE,
quiet = TRUE, blank.lines.skip = FALSE, multi.line =
FALSE)
skip <- grep("Begin Data", firstfield)
if (length(skip) == 0)
stop("Cannot find \"Begin Data\" in image output file")
nspots <- grep("End Data", firstfield) - skip - 2
obj <- read.table(fullname, skip = skip, header = TRUE,
sep = sep, quote = quote, as.is = TRUE, check.names =
FALSE,
comment.char = "", nrows = nspots, ...)
}
else if (source == "arrayvision") {
skip <- 1
cn <- scan(fullname, what = "", sep = sep, quote = quote,
skip = 1, nlines = 1, quiet = TRUE)
fg <- grep("^Median Dens - RFU", cn)
if (length(fg) != 2)
stop(paste("Cannot find foreground columns in", fullname))
bg <- grep("Bkgd", cn)
if (length(fg) != 2)
stop(paste("Cannot find background columns in", fullname))
columns <- list(Rf = fg[1], Rb = bg[1], Gf = fg[2], Gb =
bg[2])
obj <- read.table(fullname, skip = skip, header = TRUE,
sep = sep, quote = quote, as.is = TRUE, check.names =
FALSE,
comment.char = "", ...)
nspots <- nrow(obj)
}
else {
skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1
if (length(skip) == 0)
stop(paste("Cannot find column heading in image output
file", fullname))
else skip <- skip[1]
if (verbose)
cat("Reading", fullname, "after skipping", skip,
"...")
obj <- read.table(fullname, skip = skip, header = TRUE,
sep = sep, quote = quote, as.is = TRUE, check.names =
FALSE,
comment.char = "", ...)
if (verbose)
cat("\tDone.\n")
nspots <- nrow(obj)
}
Y <- matrix(0, nspots, nslides)
colnames(Y) <- names
RG <- list(R = Y, G = Y, Rb = Y, Gb = Y)
if (source == "smd") {
anncol <- grep(columns$Gf, colnames(obj)) - 1
if (anncol > 0)
RG$genes <- data.frame(obj[, 1:anncol])
}
if (!is.null(wt.fun))
RG$weights <- Y
for (i in 1:nslides) {
if (i > 1) {
fullname <- slides[i]
if (!is.null(path))
fullname <- file.path(path, fullname)
##
## HACK HACK. Works for genepix files, but others???
##
skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1
if (length(skip) == 0)
stop(paste("Cannot find column heading in image output
file", fullname))
else skip <- skip[1]
if (verbose)
cat("Reading", fullname, "after skipping", skip,
"...")
obj <- read.table(fullname, skip = skip, header = TRUE,
sep = sep, as.is = TRUE, quote = quote, check.names =
FALSE,
comment.char = "", nrows = nspots, ...)
cat("\tDone.\n")
}
RG$R[, i] <- obj[, columns$Rf]
RG$G[, i] <- obj[, columns$Gf]
RG$Rb[, i] <- obj[, columns$Rb]
RG$Gb[, i] <- obj[, columns$Gb]
if (!is.null(wt.fun))
RG$weights[, i] <- wt.fun(obj)
# if (verbose)
# cat(paste("Read", fullname, "\n"))
}
new("RGList", RG)
}
Dear Dave,
Thanks for the diagnosis. You are quite correct that read.maimages()
does
assume that all the gpr files in a batch have headers of the same
number of
lines, and this assumption should be relaxed. As it happens, the
assumption
has always been true for genepix data sets that I have seen.
Gordon
At 03:53 AM 14/04/2004, David Nelson wrote:
>Hello all,
>
>I've just started plowing thru some two-color arrays and ran across
the
>same bug.
>
>The problem I found is due to an efficiency hack that doesn't always
work:
>read.maimages assumes that all the genepix files it reads have the
same
>number of header records. That is, rather than taking the number of
header
>records directly from each .gpr file, it counts the number of lines
until
>the header line in the *first file only*, and then assumes that all
the
>rest of the files have the same number of records.
>
>If not, then the skip= parameter is incorrect, read.table() starts in
the
>wrong place, and results are, as they say, unpredictable.
>
>The "right" way to do this is to read the first lines of each .gpr
file,
>get the number of header records, and then use read.table with the
right
>number of header records. A quick hack is just to search for the
header in
>each file.
>
>I've attached a modification to read.maimages() that does exactly
that, at
>least the three times I've tried it :-)
>
>
>Cheers,
>
>Dave Nelson