Entering edit mode
Dick Beyer
★
1.4k
@dick-beyer-26
Last seen 10.4 years ago
I'd like to get anyone's comments on how to select the best probeset
from a group of probesets that represent the same gene.
In the Broad Institute's GSEA software, it is important to only have
one entry per gene for the GSEA input files so as to avoid inflation
of the enrichment score. In GSEA analysis (Subramanian et. al. PNAS
2005 and the GSEA user guide
http://www.broad.mit.edu/gsea/doc/GSEAUserGuide.pdf), it is
recommended that,
from page 35 in the user guide:
"Collapsing mode for probe sets => 1 gene. Select the value to use for
the
single probe that will represent all probe sets for the gene:
max_probe
(default) to use the highest expression value or median_of_probes to
use the
median value."
I think that this procedure may not yield the best results in all
cases. For example, I have several HG-U133A chips and on this chip
there are six probesets for gene FN1: 214702_at, 214701_s_at,
212464_s_at, 210495_x_at, 211719_x_at, 216442_x_at. If I use the
probeset with the highest expression value, then I would choose
212464_s_at. However, on closer inspection, only two of these
probesets, 214702_at, 214701_s_at, don't cross-hybridize. And, only
one of them, 214702_at, was originally designed to uniquely hybridize.
It seems reasonable to me to use this cross-hybridization information
in the selection of the best probeset.
There is further information on affy probe transcript assignment that
results in the assignment of a Grade (A,B,C,E,R). The document I
received from Affy "Transcript Assignment Whitepaper101205.doc"
describes this process. Perhaps that information should be used as
well.
Since the cross-hybridization information is available in electronic
form, HG-U133A_annot.csv from the affy website, it seems relatively
easy to use this for some type of filtering.
In summary, in the context of GSEA, when you have to choose the best
single probeset from a set of probesets that all represent the same
gene, should those probesets that are known to cross-hybridize be
rejected? Of the remaining list of probesets, should the ones with a
higher Grade be referred over those with a lower Grade?
Generally speaking, I would probably want to live with more false
positives and discuss such cases with the investigator to see if
further validation is necessary. In my particular case of FN1 on HG-
U133A, the cross-hybing probesets give the opposite differential
expression as the non-cross-hybing probesets do. I suppose you could
also run more than one GSEA analysis using different criteria for
multiple probesets, but that seems a bit daunting.
If anyone has comments, I will collect them and report back.
Thanks very much,
Dick
**********************************************************************
*********
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer