Entering edit mode
Jeff Sorenson
▴
70
@jeff-sorenson-60
Last seen 10.2 years ago
I would like to thank all of the contributors to the bioconductor
project
for putting their tools into the public domain. I'm embarking on a
project
using Affymetrix U133A/B chips and have been in the process of setting
up a
database of probe/sequence information and other annotation
information
(mysql), and learning to use the various R packages. Looking over the
probe
sequences and putative gene sequences that affymetrix provides on
their
website, it is clear that many of the probes are nonspecific - e.g,
they
perfectly match portions of gene sequences that are differenct than
the one
they were derived from. In some cases, it appears that affymetrix has
simply generated multiple probe sets for transcriptional variants of
the
same gene. In other cases, it appears that some probes are simply
nonspecific. Affymetrix does warn us that some probe sets are less
specific
than others, and this is indeed incorporated into their probe set
nomenclature, but I have found no downloadable file that lists the
specifics. My computer should be done testing the half million probes
for
perfect matches against the ~45000 sequences some time later this
week.
After that, I will probably test the mismatch probes.
My question to this community is this: is there already an annotation
file
or package that takes this consideration into account? If so, can
this
information be readily adapted into the R packages for probe level
analysis
and gene expression estimation?
In a related question, can anyone point me to an algorithm for
accurately
estimating the hybridization probability of an arbitrary probe against
an
arbitrary mRNA. Would it correlate closely to the BLAST score? Has
anyone
done theoretical studies on the nature of the mismatch probes and
their
usefulness in measuring "nonspecific" binding? It would be nice to be
able
to predict how strongly a particular mRNA should bind to each of the
probes
on a chip (both PM and MM). If this is feasable, has anyone done in
computo
chip hybridization experiments to see how closely the estimated
expression
levels are to the actual input?
Thanks,
Jeff Sorenson