Entering edit mode
Richard Friedman
Last seen 10.5 years ago
Dear Bioconductor Users,
I am using genefilter to filter an ExpressionSet of 4 Mouse
430 2 chips
preprocessed with gcrma prior to analysis with limma.
Here is a description of the expressionset.
> xen2dataeset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 45101 features, 4 samples
element names: exprs
sampleNames: A_xen_1_21.cel, A_xen_2_22.cel, D_nodal_1_27.cel,
varLabels and varMetadata description:
sample: arbitrary numbering
featureNames: 1415670_at, 1415671_at, ..., AFFX-r2-P1-cre-5_at
(45101 total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: mouse4302
Here is my session information.
> sessionInfo()
R version 2.6.1 (2007-11-26)
attached base packages:
[1] splines stats graphics grDevices utils datasets
[8] base
other attached packages:
[1] mouse4302probe_2.0.0 mouse4302cdf_2.0.0 mouse4302.db_2.0.2
[4] limma_2.12.0 geneplotter_1.16.0 lattice_0.17-2
[7] annotate_1.16.1 AnnotationDbi_1.0.6 RSQLite_0.6-3
[10] DBI_0.2-3 RColorBrewer_1.0-1 affyPLM_1.14.0
[13] xtable_1.5-2 simpleaffy_2.14.05 gcrma_2.10.0
[16] matchprobes_1.10.0 genefilter_1.16.0 survival_2.34
[19] annaffy_1.10.1 KEGG_2.0.1 GO_2.0.1
[22] affy_1.16.0 preprocessCore_1.0.0 affyio_1.6.1
[25] Biobase_1.16.3
loaded via a namespace (and not attached):
[1] KernSmooth_2.22-21 grid_2.6.1 tools_2.6.1
I have tried the filtering parameters in the article by Scholtens and
Heydebreck on
p 233 of the book by Gentleman et al.:
> f2<-function(x)(IQR(x)>0.5)
> ff<-filterfun(f1,f2)
> selected <-genefilter(xen2dataeset,ff)
> sum(selected)
[1] 289
This seemed a bit small so that I tried the effect of each of the
parameters individually:
selectedp025A <-genefilter(xen2dataeset,f1)
> sum(selectedp025A)
[1] 9681
> selectedIQRgtp5 <-genefilter(xen2dataeset,f2)
> sum(selectedIQRgtp5)
[1] 731
My questions;
1. Is the log2(100) intensity cutoff good for all chips?
If not can someone recommend a good intensity cutoff for
mouse 4302.
2, Is the only effect of filtering to reduce the multiplier in the
false discovery
analysis OR does it reduce false positives in other ways by
A. In the case of intensity filters by reducing the number of
fold changes resulting
from the ratios of small numbers.
B. In the case of IQR filters eliminating large t-statistics
resulting for genes with small variation
across samples but fortuitously low standard deviations,
Up until this time I have not filtered because the filtering
parameters looked arbitrary and I
thought that it was cheating to reduce the # of tests used to compute
the FDR. From reading and
further reflection I now believe otherwise. But whereas I now believe
I should filter I am
not at all sure what parameters to use, and how much my final list of
differentially expressed genes
will be sensitive to a choice of those parameters. In particular, i
wonder if the
intensity filter cutoff should vary with chip-type and preprocessing
method (eg GCRMA).
Any thoughts and guidance would be appreciated.
Thanks as always,
Richard A. Friedman, PhD
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Department of Biomedical Informatics (DBMI)
Educational Coordinator
Center for Computational Biology and Bioinformatics (C2B2)
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Box 95, Room 130BB or P&S 1-420C
Columbia University Medical Center
630 W. 168th St.
New York, NY 10032
(212)305-6901 (5-6901) (voice)
friedman at cancercenter.columbia.edu
"Sure I am willing to stop watching television
to get a better education."
-Rose Friedman, age 11