Entering edit mode
Matthew Hannah
▴
940
@matthew-hannah-621
Last seen 10.4 years ago
Thanks,
I've investigated this some more and found that the
ATH1-121501_probe_tab.zip
file from the affy website contains 251,121 sequences whilst the CEL
files and
the ATH1-121501_probe_fasta.zip only contain 251,078 probes. It
therefore seems
that the errors were there in the tab file before the BioC
ath1121501probe
package was made. I've emailed affymetrix about it but don't expect a
quick
response judging from past queries.
So does anyone know how to find the extra values in the tab file? It
doesn't
look like there are simply extra values added at the start or finish.
Does anyone
familiar with R know how to obtain a list of Affy ID vs. # of probes
from the
ath1121501probe package or by reading in the ATH1-121501_probe_tab
file. This
would be easy to cross-reference with the Affy ID vs. probe number
that you get
from the CEL file during MAS5 analysis.
Has this been an issue for any other chips, are we just trusting
affymetrix to
provide the correct sequence data? I've seen some data showing that
~700 ATH1
probesets don't match their intended target when an independent BLAST
was done.
Thanks
Matt
>HI,
> there seems to be a disagreement on how many pm probes there are on
the
>chip. This is causing problem in matching the pm intensities with
>sequences. I am not sure if this is true for all ATH1 chip...
>
> After reading in your Cel file into "object",
>###########
> pmIndex <- unlist(indexProbes(object,"pm"))
> length(pmIndex)
> #[1]251078
> #however the probe package gives 251121 pm probe sequences.
> length(get("ath1121501probe")$sequence)
> [1] 251121
> right now I am not sure which should be fixed-- whether the probe
>package has some redundent sequences that are not PM probes or the
>indexProbes missed some pm probes?