Dear Dr. Smyth.
Would you be that kind to help me on deciding whether yes or no to
filter my microarray data set with a filtering method correcting for
variance such as I/NI method from Talloen et al. (2007). Whereas many
researchers say that filtering should increase the power of the test,
then increasing the chance to get true deferentially expressed genes.
However when I analyzed my data set. I found the next: (meaning lower
number of DEG when filtering).
Ortoghonal contrasts # of genes
(adjustedP >0.05 and FC >1.4)
w/o filtering I/NI filtering
FAT 195 118
FA 329 151
MR 169 103
FAT by MR 854 321
FA by MR 961 283
Also, I found that Bourgon et al. (2010) do not recommend to combine
the use of limma t-statistic with filtering. So please, I will
appreciate your suggestion on whether filter or not filter my data
set.
Thanks in advance.
Miriam
********************************
Miriam Garcia, MS, PhD
Department of Animal Sciences
University of Florida
[[alternative HTML version deleted]]
Miriam:
To clarify: Bourgon et al. (2010) discourage the use of the limma
t-statistic specifically with overall-variance filtering, since this
invalidates type-I error control of the combined procedure. Either
component by itself (limma t; or overall-variance filtering and normal
t) is fine. Nobody yet seems to have worked out how to combine them
(and it may not be worthwhile.)
I leave it to others to comment on I/NI filtering and limma.
Also, as Jelle Goeman notes, by combining the threshold on adjustedP
and on FC (>1.4) you are being anti-conservative. This in combination
with your filtering likely explains the effect you see.
Bottomline, by combing three criteria:
- limma-t
- I/NI
- FC cutoff
you are putting yourself into a difficult area of statistics, and
unless you really know what you are doing, it might be best to
deconvolute your criteria.
Best wishes
Wolfgang
On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote:
> Dear Dr. Smyth.
>
> Would you be that kind to help me on deciding whether yes or no to
filter my microarray data set with a filtering method correcting for
variance such as I/NI method from Talloen et al. (2007). Whereas many
researchers say that filtering should increase the power of the test,
then increasing the chance to get true deferentially expressed genes.
However when I analyzed my data set. I found the next: (meaning lower
number of DEG when filtering).
>
>
>
>
> Ortoghonal contrasts # of genes
> (adjustedP >0.05 and FC >1.4)
> w/o filtering I/NI filtering
> FAT 195 118
> FA 329 151
> MR 169 103
> FAT by MR 854 321
> FA by MR 961 283
>
> Also, I found that Bourgon et al. (2010) do not recommend to combine
the use of limma t-statistic with filtering. So please, I will
appreciate your suggestion on whether filter or not filter my data
set.
>
> Thanks in advance.
> Miriam
>
>
>
> ********************************
> Miriam Garcia, MS, PhD
> Department of Animal Sciences
> University of Florida
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Dear Miriam,
I don't know what I/NI filtering is and it isn't really my job to make
a
running commentary on every filtering method that gets published.
However the limma algorithm analyses the spread of the genewise
variances.
Any filtering method based on genewise variances will change the
distribution of variances, will interfere with the limma algorithm and
hence will give poor results.
Like most people, I recommend filtering out genes that don't appear to
be
expressed in any sample. See for example Case studies 15.3 or 15.4 in
the
limma User's Guide.
However you will find if you use eBayes(fit,trend=TRUE) instead of the
usual eBayes(fit) that limma gives pretty good results regardless how
much
filtering you do, provided of course that the filtering is on
expression
and not on variance.
The literature tends to say that the reason for filtering is to reduce
the
amount of multiple testing, but in truth the increase in power from
this
is only slight. The more important reason for filtering in most
applications is to remove highly variable genes at low intensities.
The
importance of filtering is highly dependent on how you pre-processed
your
data. Filtering is less important if you (i) use a good background
correction or normalising method that damps down variability at low
intensities and (ii) use eBayes(trend=TRUE) which accommodates a
mean-variance trend.
Best wishes
Gordon
> On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote:
>
>> Dear Dr. Smyth.
>>
>> Would you be that kind to help me on deciding whether yes or no to
>> filter my microarray data set with a filtering method correcting
for
>> variance such as I/NI method from Talloen et al. (2007). Whereas
many
>> researchers say that filtering should increase the power of the
test,
>> then increasing the chance to get true deferentially expressed
genes.
>> However when I analyzed my data set. I found the next: (meaning
lower
>> number of DEG when filtering).
>>
>>
>> Ortoghonal contrasts # of genes
>> (adjustedP >0.05 and FC >1.4)
>> w/o filtering I/NI filtering
>> FAT 195 118
>> FA 329 151
>> MR 169 103
>> FAT by MR 854 321
>> FA by MR 961 283
>>
>> Also, I found that Bourgon et al. (2010) do not recommend to
combine
>> the use of limma t-statistic with filtering. So please, I will
>> appreciate your suggestion on whether filter or not filter my data
set.
>>
>> Thanks in advance.
>> Miriam
>>
>>
>> ********************************
>> Miriam Garcia, MS, PhD
>> Department of Animal Sciences
>> University of Florida
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Dr. Smyth.
Thanks for your explanation.
I have checked the example you provide in LIMMA user guide # 15.4.
which is using agilent arrays. In there I found that you are using a
"new" at least for me, normalization method (> y <-
backgroundCorrect(x,method="normexp") and > y <-
normalizeBetweenArrays(y,method="quantile"), followed by the filtering
method using the trend = true, that you suggested in your first reply.
I have been using GCRMA as normalization method. So I am wonder if I
could still use the true/false filtering method with GCRMA. Also I
tried to look for some people that requested/published code when using
affymetrix array instead of agilent to perform the same analysis as in
# 15.4, and I couldn't find that, Does that mean that it do not work
for affymetrix, I guess I am wrong.
Thank you very much indeed.
Miriam
********************************
Miriam Garcia, MS, PhD
Department of Animal Sciences
University of Florida
________________________________________
From: Gordon K Smyth [smyth@wehi.EDU.AU]
Sent: Wednesday, May 22, 2013 7:37 PM
To: Garcia Orellana,Miriam
Cc: Bioconductor mailing list
Subject: Filtering is not recommended with LIMMA?
Dear Miriam,
I don't know what I/NI filtering is and it isn't really my job to make
a
running commentary on every filtering method that gets published.
However the limma algorithm analyses the spread of the genewise
variances.
Any filtering method based on genewise variances will change the
distribution of variances, will interfere with the limma algorithm and
hence will give poor results.
Like most people, I recommend filtering out genes that don't appear to
be
expressed in any sample. See for example Case studies 15.3 or 15.4 in
the
limma User's Guide.
However you will find if you use eBayes(fit,trend=TRUE) instead of the
usual eBayes(fit) that limma gives pretty good results regardless how
much
filtering you do, provided of course that the filtering is on
expression
and not on variance.
The literature tends to say that the reason for filtering is to reduce
the
amount of multiple testing, but in truth the increase in power from
this
is only slight. The more important reason for filtering in most
applications is to remove highly variable genes at low intensities.
The
importance of filtering is highly dependent on how you pre-processed
your
data. Filtering is less important if you (i) use a good background
correction or normalising method that damps down variability at low
intensities and (ii) use eBayes(trend=TRUE) which accommodates a
mean-variance trend.
Best wishes
Gordon
> On 21 May 2013, at 03:06, "Garcia Orellana,Miriam" <mgarciao at="" ufl.edu=""> wrote:
>
>> Dear Dr. Smyth.
>>
>> Would you be that kind to help me on deciding whether yes or no to
>> filter my microarray data set with a filtering method correcting
for
>> variance such as I/NI method from Talloen et al. (2007). Whereas
many
>> researchers say that filtering should increase the power of the
test,
>> then increasing the chance to get true deferentially expressed
genes.
>> However when I analyzed my data set. I found the next: (meaning
lower
>> number of DEG when filtering).
>>
>>
>> Ortoghonal contrasts # of genes
>> (adjustedP >0.05 and FC >1.4)
>> w/o filtering I/NI filtering
>> FAT 195 118
>> FA 329 151
>> MR 169 103
>> FAT by MR 854 321
>> FA by MR 961 283
>>
>> Also, I found that Bourgon et al. (2010) do not recommend to
combine
>> the use of limma t-statistic with filtering. So please, I will
>> appreciate your suggestion on whether filter or not filter my data
set.
>>
>> Thanks in advance.
>> Miriam
>>
>>
>> ********************************
>> Miriam Garcia, MS, PhD
>> Department of Animal Sciences
>> University of Florida
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Dear Gordon
> The literature tends to say that the reason for filtering is to
reduce the amount of multiple testing, but in truth the increase in
power from this is only slight. The more important reason for
filtering in most applications is to remove highly variable genes at
low intensities. The importance of filtering is highly dependent on
how you pre-processed your data. Filtering is less important if you
(i) use a good background correction or normalising method that damps
down variability at low intensities and (ii) use eBayes(trend=TRUE)
which accommodates a mean-variance trend.
With all respect, I think this paragraph mixes up two separate issues
and can benefit from clarification.
1. While literature can probably be found to support any statement,
the above-cited reason is indeed bogus when multiple testing is
performed with an FDR objective. The paper by Bourgon et al. motivates
filtering differently, namely by using a filter criterion that is
independent of the test statistic under the null (thus does not affect
type-I error; some subtlety is discussed in that paper) but dependent
under the alternative (thus improves power).
2. "Highly variable genes at low intensities" are indeed a problem of
bad preprocessing and are better dealt with at that level, not by
filtering. Nowadays, the commonly used methods for expression
microarray or RNA-Seq analysis that I am aware of avoid that problem.
3. The question when & how independent filtering (as in 1) is
beneficial is quite unrelated to preprocessing. You are right that FDR
is a property of the whole selected gene list, not of individual
genes, and that different approaches exist for spending the type-I
error budget wisely, by weighting different genes differently; of
which independent filtering is one and trended eBayes (which is not
the default option in limma) may be another.
Best wishes
Wolfgang
Reference:
Bourgon et al. PNAS 2010: http://www.pnas.org/content/107/21/9546