Hi,
I have been trying to reproduce the results from the Naef and Magnasco
(2003) paper where 95%
of the MM>PM intensities held a purine at the 13th residue. Using
bioconductor, two CEL files from
the hgu133a chip contained 71% and 72% purines where MM>PM. A CEL
file from the hgu95av2
chip contained 48% purines where MM>PM, but perhaps I need to override
the i2xy() function in case
there still exists an error.
Have you found the same discrepancy? Do you know why my data is not
corresponding to the data
in Naef and Magnasco (2003)?
This is my R code:
pmIndexes <- unlist(indexProbes(abatch,"pm"))
mmIndexes <- unlist(indexProbes(abatch,"mm"))
L <- length(pmIndexes)
B13 <- "N"
for (i in 1:L)
{
pm<- intensity(abatch)[pmIndexes[i],]
mm <- intensity(abatch)[mmIndexes[i],]
if (mm > pm) {
probeSequence <- unlist(strsplit(hgu133aprobe[i,]$sequence,""))
B13 <- c(B13,probeSequence[13])
}
}
Thank you for your consideration.
Lana
[[alternative HTML version deleted]]
Hi Lana,
I did explain to you in a previous message (not on bioC mailing list)
in
quite some detail, with a working code example, how to map between the
rows of AffyBatches and those of the probe-table. I also did tell you
that
code like that you show below will not work. The mapping you presume
is
not correct, and the results that you get are in error.
Please do read the documentation.
I do not know why you chose to ignore the documentation and my
previous
message. But please do not insinuate that packages are buggy if in
fact
you have not read the documentation and choose to ignore package
authors'
advice.
For the record, here is the previous message (14 Jan 2005):
-------------------------------------------------------------
Hi Lana,
Start with the hgu95av2probe table.
> print.data.frame(hgu95av2probe[1:5,])
You get the names of probesets by
> psnames <- unique(hgu95av2probe$Probe.Set.Name)
> length(psnames)
[1] 12453
You get the row numbers in this table of the probes for an individual
probe set by
> n <- nrow(hgu95av2probe)
> probes <- split(1:n, hgu95av2probe$Probe.Set.Name)
> p <- probes[["1323_at"]]
> p
You get the row numbers of these probes in the AffyBatch by
> ind <- with(hgu95av2probe, xy2i( x[p], y[p]))
and the expression data:
> library(affydata)
> data(Dilution)
> exprs(Dilution)[ind,]
> matplot(t(exprs(Dilution)[ind,]), type="b")
Remarks:
1. Note that if you have multiple CDF packages around, there are
multiple
xy2i functions which clash (You get a warning when loading for example
the
hgu133acdf package as well). You will then need to use the somewhat
more
verbose expression:
ind2 <- with(hgu95av2probe,
get("xy2i", "package:hgu95av2cdf")( x[p], y[p]))
Alternatively, you can use the function xy2indices from the affy which
also explicitely asks for the array type it is called for.
2. I know that some of this could be easier - some of the reasons why
it
is as it is that the probe packages are also intended to work with
chips
other than Affymetrix genechips. Also, Rafael Irizarry is working on a
"new" and better affy package.
Notice that "ind" is identical to what you would get from the CDF
environment:
hgu95av2cdf$"1323_at"
more excatly, to the "PM" column there:
> ind == hgu95av2cdf$"1323_at"["pm", ]
The "MM" are obtained by adding
> ind+640 == hgu95av2cdf$"1323_at"[, "mm"]
3. I realize that this is not very well documented currently neither
in
probe packages man pages nor the matchprobes or affy vignettes. The
problem is that I am already completely over-committted with other
projects, most Bioconductor-related. So if you succeed with this,
would
you be willing to write up 1/2 page or however much it takes to
explain
this? It could go in the affy vignette, or matchprobes, or the probe
packages man pages (which take a little longer to propagate to the
users
though). And/or you could post a summary to the bioC mailing list.
Best wishes
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Http: www.ebi.ac.uk/huber