Entering edit mode
Hi,
sorry for the late response on this. Yes, if there is a bug, it could
to be related updateCel() of affxparser. Though, I cross my fingers
that there is another reason for all this - something that is not a
bug but a simple mistake.
I am aware of the note in help("seek") saying "Use of seek on Windows
is discouraged. We have found so many errors in the Windows
implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the R developers' time
with bug reports on Windows' deficiencies.". Unfortunately, there is
no other option than using seek(), but of course, it should not give
errors.
FYI, updateCel() has been used a lots on Windows for many years
without any issues, although I acknowledge that issues as those you
are reporting could indeed be missed. One way to know that the
problem is with updateCel() you could add the following, e.g.
for (j in 1:9) {
filename <- filenames[j];
y <- exp(output[,j]);
# Update CEL file
updateCel(filename, indices=indices, intensities=y);
# Assert correctness
y2 <- readCel(filename, indices=indices, readIntensities=TRUE,
readHeader=FALSE, readStdvs=FALSE, readPixels=FALSE, readXY=FALSE,
readOutliers=FALSE, readMasked=FALSE)$intensities;
if (!all.equal(y2, y)) {
# Report the data we tried to write
str(list(filename=filename, indices=indices, intensities=y,
rereadIntensities=y2));
stop("Error in updateCel()!").
}
} # for (j ...)
One could do an even fancier version where one compare the complete
CEL file before and after, which then would detect incorrectly written
values outside the file areas that 'indices' points to.
Also before troubleshooting more, what is your sessionInfo()?
/Henrik
(co-author of affxparser)
On Fri, Jul 29, 2011 at 1:27 AM, Steve Pederson
<stephen.pederson at="" adelaide.edu.au=""> wrote:
> Hi,
>
>
>
> I'm building an array analysis package using an MCMC process for
each gene &
> am writing the key values (posterior quantiles, mean, sd etc) for
each gene
> to disk using CEL files and the aroma.affymetrix architecture.
However I've
> noticed that on rare occasion, the incorrect values are written to
the CEL
> files.
>
>
>
> As an example of what goes wrong, if I have a vector of 8 values
(e.g.
> intensities=1:8) and am updating 8 cells (e.g.
indices=221632:221639) with
> these 8 values, occasionally the first value (i.e. intensities[1])
would be
> incorrectly given to all 8 cells on the CEL file. This has happened
in an
> obvious manner only to 2 different genes in 2 of 3 datasets & I
can't spot a
> connection in the data structures that may cause the misfire. If I
can see
> these 2 obvious problems, I'm unsure as to what less obvious errors
there
> may be lurking in the dataset & hence my confidence in the results
is
> somewhat diminished.
>
>
>
> The code from the relevant section of my function (where I write
batches of
> genes at a time) is:
>
> monoUgcMap <- getUnitGroupCellMap(monoCdf, unit=batchUnits) # Get
the
> UnitGroupCellMap from a monocell cdf for the current batch units
>
> indices <- monoUgcMap$cell
>
> for (j in 1:9) { # The output after the MCMC process will always
have 9
> columns, which are written to separate CEL files
>
> ? ? ?updateCel(filenames[j], indices=indices,
intensities=exp(output[,j]))
> #update the CEL file for the current batch
>
> }
>
>
>
> I am wondering if the issue arises from the usage of 'seek' within
the
> function 'updateCel', as I am on a Windows 7 x64 system? The line
I'm
> referring to ?from the 'updateCel' source code is:
>
>
>
> seek(con, origin="start", where=offset, rw="read")
>
>
>
> Reading the 'seek' help page didn't really shed much light (to my
flawed
> brain) on what this step does, but I did spot the warning about this
> function under Windows. ?I'm still a bit unclear as to what this
step does,
> so am looking for either a clarification as to the point of this
line of
> code, or any suggestions for a simple workaround.
>
>
>
> As it appears to be a random & rare misfire, I'll have to make
changes & run
> a few datasets (over multiple days) just to check it so may be a bit
slow
> with any responses...
>
>
>
> Thanks in advance,
>
>
>
> Steve
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>