Entering edit mode
Eric Blalock
▴
250
@eric-blalock-78
Last seen 10.2 years ago
Hi,
To add to what Rafael Irizarry said, when we had multiple subjects/
chips
per treatment group in our recent publication (Blalock et al, 2003, J
Neurosci.), we used P/A filtering to determine what probe sets were to
be
included in the 'final' statistical analysis. Because we did this on
an
entire record basis- that is, a single probe set was removed from
further
consideration if there were 'too many' absence calls for that probe
set
(the determination was arbitrarily set at 40% presence calls in at
least
one treatment group), the F-statistics for each gene that remained
were
unchanged. However, this filtering has a huge effect on the error of
multiple testing when using the 'MAS' algorithms because part of what
is
being removed is the unexpressed probe set contingent- that fairly
large
group of probe sets (in our case nearly 50%) that are not detectable/
not
expressed in the tissue of interest (I'd guess that this will be an
issue
with any 'general purpose' array designed to genome wide expression).
Affy is as much as telling you that they are not confident in the
average
difference score (ADS) and signal intensity (SI) numbers their
algorithms
produce if the probe sets are rated absent. My current understanding
is
that the MAS metrics are not 'stand alone'. Although Affy intends ADS
and
SI to be their quantitative measures of mRNA level, these measures go
hand
and glove with thier respective absence calls. As far as what the
absence
calls mean, there appears to be a shell game (three card monte) going
on
with the 'why' of absence calls. You are correct that many probe sets
are
called absent because they have insufficient signal, but many probe
sets
are also called 'absent' because, although there is sufficient signal
intensity, there is also too much disagreement among probe pairs. Thus
there are two reasons probe sets get called absent, 1) the signal is
too
dim and 2) the probe set is not working the way the algorithm expects.
Oh,
and add an interaction of those two as well.
So if you are using another algorithm like RMA to look at your data,
then
the presence/ absence calls could be dangerous because they are taking
out
probe sets that didn't work well for MAS, however those probe sets may
have
done just fine with RMA.
Hope that helps.,
-E
>Message: 4
>Date: Fri, 30 May 2003 17:28:45 +0100
>From: "Crispin Miller" <cmiller@picr.man.ac.uk>
>Subject: [BioC] replicates and low expression levels
>To: <bioconductor@stat.math.ethz.ch>
>Message-ID:
>
<baa35444b19ad940997ed02a6996aae00b1448@sanmail.picr.man.ac.uk>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi,
>Just a quick question about low expression levels on Affy systems - I
hope
>it's not too off-topic; it is about normalisation and data
analysis...
>I've heard a lot of people advocating that it's a good idea to
perform an
>initial filtering on either Present Marginal or Absent calls, or on
>gene-expression levels (so that only genes with an expression > 40,
say,
>after scaling to a TGT of 100 using the MAS5.0 algorithm, are part of
the
>further analysis). Firstly, am I right in thinking that this is to
>eliminate data that are too close to the background noise level of
the system.
>
>I wanted to canvas opinion as to whether people feel we need to do
this if
>we have replicates and are using statistical tests - rather than just
>fold-changes - to identify 'interesting' genes. Does the statistical
>testing do this job for us?
>
>Crispin
>