Just a couple of comments to add.
To filter or not to filter will depend on the application. If your
problem is one of classification and you don't care what fragments are
used in your classifier function, there is no harm in filtering absent
genes. You will have plenty of genes to select from among the more
reproducible, high intensity fragments that are called present all the
time. If you are looking for marker genes, on the other hand,
filtering on presence calls may very well hide some relevent markers.
What is of interest to me is a thorough characterization of probe sets
with respect to the relationship between PM and MM and the effect on
detection of mRNA molecule. When is the call even worthy of its name?
To me "call=P" means PMs are greater than MM by some reasonable
measure. When does is "call=P" equivalent to "the mRNA molecule is
present"? I would publish such a report (If I was editor, that is.)
-f
Eric <emblal@uky.edu> wrote:
Hi,
As a user I can second that- we definitely use P/A (as well as scaling
factor) to see if a chip within a treatment group has gone awry.
Regarding P/A calls as a filter- I'm fairly certain most users would
agree that, among the probe sets found to be present in 100% of the
chips in the study, there are a greater proportion of statistically
significant findings than would be found among the chips that were
100% absent- of course this is with MAS5 as a probe level algorithm.
Interestingly, our own lab's observations are that, among probe sets
in which >80% of the chips shows presence calls in one treatment group
and <20% of the chips show presence in the other treatment group (a
relatively small group of 20 genes in the example I'm using- 10 chips
per group), the significance proportion was actually worse than in the
'fully present cadre'. I've seen this in at least three other data
sets with 7or more chips per group. Although my initial assumption was
that selecting for presence in one group and absence in the other
would bias me towards finding significant results, my interpretation
after seeing that this selection actually reduces the 'significance
proportion' is that dividing up the data by P/A call like this
isolates probe sets for which the data is noisier, but not necessarily
smaller, in one group than the other.
Regarding MAS4 algorithm going to MAS5, I think the greatest tragedy
was the eradication of the negative values by artificially altering
the MM values. If the fragment was not present in the mix, then PM and
MM should both be randomly hybridizing, and I would expect that about
half of the time the PM < MM, so those values, while they may not make
biological sense, are exactly what you would expect by the probe set
design. Further, if, as you mention and we've seen in our own data,
some negative values are good discriminators, then there may be some
negative values that are there for other reasons- 'cross
hybridization' or Affy's assumptions regarding how probe sets behave
may not hold true in every case. However the MAS5 algorithm blinds
users to such changes.
I've also gone in and looked at the dichotomy between MAS5 and RMA
(different probe level algorithms, same test; 1-way ANOVA). As I've
said before, there is no shortage of discrepancies between the two
(RMA finds 508 significant, MAS5 finds 409, there is an overlap of 146
between the two). We specifically looked at feature values in probe
sets that were:
1) RMA: very significant (p < .001) with RMA and very non-significant
(p>.9) with MAS5
2) MAS5: very significant (p < .001) with MAS5 and very non-
significant (p>.9) with RMA
3) RMA & MAS5: high concordance (p<.001 in both)
We isolated images and extracted PM and MM values for the top 10 probe
sets in each of the three categories. The plotted PM and MM values
reveal different phenomena that go into the 'failure to agree'. First,
in cases where tests performed with RMA found significant differences
and MAS5 did not, this was often because of some large variations in
the behavior of the MM features. Where tests performed with MAS5 found
significant differences and RMA did not, this was often because the MM
subtraction amplified a difference that already existed in the PM, or,
there was no difference in the PM and the entire result was due to the
MM differences between the two treatment groups. Of course the probe
sets that showed concordance between the two probe level algorithms
were well-behaved. I presented some of this at the 3rd annual Virtual
Conference on Genomics and Bioinformatics, but I never thought it was
really worth pursuing as a publication.
Do you think that there is enough interest out there to publish this,
and if so, where?
-E
At 12:01 PM 10/9/2003 +0200, you wrote:
Date: Wed, 8 Oct 2003 11:11:08 -0700 (PDT)
From: Francois Collin <fcollin@sbcglobal.net>
Subject: RE: [BioC] Affy: Present calls in an eset
To: Crispin Miller <cmiller@picr.man.ac.uk>,
bioconductor@stat.math.ethz.ch
Message-ID: <20031008181108.47125.qmail@web80406.mail.yahoo.com>
Content-Type: text/plain
Indeed %present calls is arguably the best of all data quality
indicators that are suggested by Affymetrix. If you rehybe the same
hybe mix to chips under different conditions - change scanner,
hybridization time or temperature, hybe station - %present calls can
vary widely. Genes don't appear and disappear out of the hybe mix,
but probe affinities change under the different conditions. Making
sure that %present calls are consistent across a set of chips is a way
to check that the processing and experimental conditions that affect
hybridization kinetics were fairly consistent across a set of chips.
As for the Present calls ability to discriminate between samples in
which a given mRNA fragment is present vs a samples in which it isn't,
it will vary from probe set to probe set. In an ideal probe sets in
which all PM/MM probe pairs have similar non-specific binding
affinities and the PM probe has good binding affinity to the target
mRNA fragment, and the target doesn't bind to too many other probes on
the chip, the calls will work well. It is not clear for what
proportion of probe sets the calls actually work as intended. You can
definitely find probe sets for which MM>>PM for several probe pairs in
the set and these fragments will never be called present. The reverse
is also true.
Very little has been published on the subject as far as I know. There
is the work by Ben Rubenstein mentioned earlier in this thread. More
work obviously need to go into this question. I think that one should
be aware that by screening out absent calls, you may be losing many
interesting target fragments. In the days of MAS 4.0, I recall some
genes with negative expression being very good discriminators of tumor
class.
francois
Eric Blalock, PhD
Dept Pharmacology, UKMC
859 323-8033
STATEMENT OF CONFIDENTIALITY
The contents of this e-mail message and any attachments are
confidential and are intended solely for addressee. The information
may also be legally privileged. This transmission is sent in trust,
for the sole purpose of delivery to the intended recipient. If you
have received this transmission in error, any use, reproduction or
dissemination of this transmission is strictly prohibited. If you are
not the intended recipient, please immediately notify the sender by
reply e-mail or at (859) 323-8033 and delete this message and its
attachments, if any. _______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
[[alternative HTML version deleted]]