On 6/16/06 5:30 AM, "Pedro L?pez Romero" <plopez at="" cnic.es=""> wrote:
> Dear list,
>
>
>
> I have an experiment with several contrasts associated to different
> experimental conditions for each gene, so I want to select genes in
a first
> step by their moderated-F statistic in order to reduce the number of
genes
> for multiple contrasts across genes.
Hi, Pedro.
If I understand what you are trying to do, choosing genes using
F-stats like
this for later statistical testing is not really valid. If you want
to
reduce the number of genes for subsequent testing, you should be using
an
unsupervised approach (like looking for high-variance genes, as one
example). Did I misunderstand the approach you are trying to take?
Sean
Hi Sean,
thanks a lot for the replay.
Yes, you understood my doubt. I have several experimental conditions,
and I
thought that filtering first by the moderated F-statistic would be a
right
approach, since by this I would be selecting the genes that are DE in
any of
the contrasts. Now, from this reduced list, I thought that I could
select
the genes for every independent contrast given by my experimental
conditions.
Now, I do not understand why this is not a valid strategy. In fact, I
understood that what limma user guide propose in p 51 (chapter 10) is
doing
this but using decideTests ( ) with nestedF.-
Pedro.
-----Mensaje original-----
De: Sean Davis [mailto:sdavis2 at mail.nih.gov]
Enviado el: viernes, 16 de junio de 2006 13:04
Para: Pedro L ? pez Romero; Bioconductor
Asunto: Re: [BioC] decideTests with nestedF
On 6/16/06 5:30 AM, "Pedro L?pez Romero" <plopez at="" cnic.es=""> wrote:
> Dear list,
>
>
>
> I have an experiment with several contrasts associated to different
> experimental conditions for each gene, so I want to select genes in
a
first
> step by their moderated-F statistic in order to reduce the number of
genes
> for multiple contrasts across genes.
Hi, Pedro.
If I understand what you are trying to do, choosing genes using
F-stats like
this for later statistical testing is not really valid. If you want
to
reduce the number of genes for subsequent testing, you should be using
an
unsupervised approach (like looking for high-variance genes, as one
example). Did I misunderstand the approach you are trying to take?
Sean
Hi Pedro,
Pedro L?pez Romero wrote:
> Hi Sean,
>
> thanks a lot for the replay.
>
> Yes, you understood my doubt. I have several experimental
conditions, and I
> thought that filtering first by the moderated F-statistic would be a
right
> approach, since by this I would be selecting the genes that are DE
in any of
> the contrasts. Now, from this reduced list, I thought that I could
select
> the genes for every independent contrast given by my experimental
> conditions.
I don't think you understand what decideTests() is doing. This
function
fits all the contrasts in your contrasts matrix, using one of several
different methods. So you cannot use decideTests() to filter your
genes
before doing the contrasts, because decideTests() _is_ doing the
contrasts.
As Sean mentioned, if you want to pre-filter your genes, you should be
filtering using an agnostic approach such as removing those genes that
have a low variance across all samples.
To answer your original question, I think the problem arises in this
line of your code:
decideTests(fit2,method="nestedF",adjust.method=?fdr?,p.value=0,05)
If this is a direct copy-paste of your code (and not a re-typing),
then
you made a mistake in your p.value argument. It should read 0.05, not
0,05.
HTH,
Jim
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
Hi Jim, thank you so much for the replay.-
I am not using decideTests ( ) to filter the genes. I am using
p.adjust ()
instead:
ord=which(p.adjust(fit2$F.p.value,method="fdr") < 0.05)
Then, I use the list of genes that result to be significant to apply
decideTests (..., method="separate"). I thought that this could be
conceptually similar as using decideTests (..., method="nestedF").
I understand that "nestedF" follows a step-wise procedure selecting
first
genes using their moderated F-statistic, and second (for the selected
gene)
selecting the contrast that contributes to the significance of the F
by the
largest value of their moderated t-statistics. I think that this is
clear to
me. As a matter of fact you explained this recently:
https://stat.ethz.ch/pipermail/bioconductor/2006-March/012182.html
So now, instead of using "nestedF", why is not possible to select
genes in a
two-step-wise procedure?, firstly using a F-test to select genes that
are
differentially expressed in at least one contrast and from this list
of
selected-genes, to select the genes that are significant contrast by
contrast. It would be similar to nestedF, but instead of selecting the
contrast by the largest t-statistic, I perform a t-statistic for the
whole
set of "selected-F-genes" for every contrast.
On the other hand, when using "nestedF" I get some genes with a 0.9
adjusted
p-value, that probably should not be considered as diferentially
expressed.
>To answer your original question, I think the problem arises in this
>line of your code:
> decideTests(fit2,method="nestedF",adjust.method=?fdr?,p.value=0,05)
Yes, this I re-typed the code wrongly. In my R-script is 0.05
Again, thanks for your time.-
Pedro
-----Mensaje original-----
De: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] En nombre de James
W.
MacDonald
Enviado el: viernes, 16 de junio de 2006 15:01
CC: 'Bioconductor'
Asunto: Re: [BioC] decideTests with nestedF
Hi Pedro,
Pedro L?pez Romero wrote:
> Hi Sean,
>
> thanks a lot for the replay.
>
> Yes, you understood my doubt. I have several experimental
conditions, and
I
> thought that filtering first by the moderated F-statistic would be a
right
> approach, since by this I would be selecting the genes that are DE
in any
of
> the contrasts. Now, from this reduced list, I thought that I could
select
> the genes for every independent contrast given by my experimental
> conditions.
I don't think you understand what decideTests() is doing. This
function
fits all the contrasts in your contrasts matrix, using one of several
different methods. So you cannot use decideTests() to filter your
genes
before doing the contrasts, because decideTests() _is_ doing the
contrasts.
As Sean mentioned, if you want to pre-filter your genes, you should be
filtering using an agnostic approach such as removing those genes that
have a low variance across all samples.
To answer your original question, I think the problem arises in this
line of your code:
decideTests(fit2,method="nestedF",adjust.method=?fdr?,p.value=0,05)
If this is a direct copy-paste of your code (and not a re-typing),
then
you made a mistake in your p.value argument. It should read 0.05, not
0,05.
HTH,
Jim
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be
used for urgent or sensitive issues.
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Pedro,
Pedro L?pez Romero wrote:
> Hi Jim, thank you so much for the replay.-
>
> I am not using decideTests ( ) to filter the genes. I am using
p.adjust ()
> instead:
>
> ord=which(p.adjust(fit2$F.p.value,method="fdr") < 0.05)
>
>
> Then, I use the list of genes that result to be significant to apply
> decideTests (..., method="separate"). I thought that this could be
> conceptually similar as using decideTests (..., method="nestedF").
>
> I understand that "nestedF" follows a step-wise procedure selecting
first
> genes using their moderated F-statistic, and second (for the
selected gene)
> selecting the contrast that contributes to the significance of the F
by the
> largest value of their moderated t-statistics. I think that this is
clear to
> me. As a matter of fact you explained this recently:
> https://stat.ethz.ch/pipermail/bioconductor/2006-March/012182.html
>
> So now, instead of using "nestedF", why is not possible to select
genes in a
> two-step-wise procedure?, firstly using a F-test to select genes
that are
> differentially expressed in at least one contrast and from this list
of
> selected-genes, to select the genes that are significant contrast by
> contrast. It would be similar to nestedF, but instead of selecting
the
> contrast by the largest t-statistic, I perform a t-statistic for the
whole
> set of "selected-F-genes" for every contrast.
This is the conventional method for analyzing data with ANOVA. You
first
fit the ANOVA model, then if it is significant based on the
F-statistic,
you look at contrasts to see which contrast(s) contributed to the
result. In other words, I don't think there is a reason you couldn't
do
things this way.
>
> On the other hand, when using "nestedF" I get some genes with a 0.9
adjusted
> p-value, that probably should not be considered as diferentially
expressed.
>
Not sure I understand your point. Are you saying that a particular
contrast that appears to be significant using your method ends up
having
a very large p-value if you use nestedF?
Jim
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
Hi Jim,
> Not sure I understand your point. Are you saying that a particular
> contrast that appears to be significant using your method ends up
having
> a very large p-value if you use nestedF?
Not exactly. The problem is that using decideTests(...,
method="nestedF")
with the whole set of genes (not filtered by any method), some of the
selected genes have an adjusted p-value quite large, corresponds to
genes
with a very small M value.- Theses genes are in the list of
differentially
exprssed genes selected with decideTests(..., method="nestedF").
For example, here I show you two of the genes for a particular
contrast that
would be selected using decideTests(...,method="nestedF" )
M A t P.Val
adj.P.Val
1.064610409 13.0019494 4.18633292 0.000240894
0.239943977
0.489386597 8.228402648 2.86816475 0.007621841
0.817992583
I am a bit confused with this result?, Any clue?
Pedro.-
Hi Pedro,
Pedro L?pez Romero wrote:
>
> Hi Jim,
>
>
>>Not sure I understand your point. Are you saying that a particular
>>contrast that appears to be significant using your method ends up
having
>>a very large p-value if you use nestedF?
>
>
>
> Not exactly. The problem is that using decideTests(...,
method="nestedF")
> with the whole set of genes (not filtered by any method), some of
the
> selected genes have an adjusted p-value quite large, corresponds to
genes
> with a very small M value.- Theses genes are in the list of
differentially
> exprssed genes selected with decideTests(..., method="nestedF").
>
> For example, here I show you two of the genes for a particular
contrast that
> would be selected using decideTests(...,method="nestedF" )
>
>
> M A t P.Val
adj.P.Val
> 1.064610409 13.0019494 4.18633292 0.000240894
0.239943977
> 0.489386597 8.228402648 2.86816475 0.007621841
0.817992583
>
>
> I am a bit confused with this result?, Any clue?
Your above statement is a bit confusing, but if I assume that you are
comparing decideTests() with method="nestedF" using all genes, then
with
only those genes that have a significant F-statistic, then yes I think
I
know what is happening.
You have to understand that the adjusted p-value is a function of n,
where n is the number of simultaneous comparisons you are doing. If
you
are filtering out a large number of genes, then the p-value adjustment
will be less extreme because there are much fewer comparisons to
adjust
for. This is one of the main reasons people like to pre-filter their
data - it increases their ability to detect differential expression
because n is smaller.
Does that help?
Best,
Jim
>
> Pedro.-
>
>
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
Hi Pedro,
you can check if something went wrong using the following code:
padj=apply(limma.fit$p.value,2,p.adjust,method="fdr")
maux=padj*abs(resDecideTest)
#report genes with at least one pvalue significant
i<-apply(maux,1,function(x){any(x)})
cat(paste("# DE genes:",sum(i),"\n"))
#report how many contrasts classified as +1 or -1 by decideTests
#get a separated adjusted pvalue >0.05
i<-apply(maux,1,function(x){!any(x<0.05)})
cat(paste("#inconsistencies:",sum(i),"\n"))
limma.fit is your limma fit result
resDecideTest is the output of decideFunction (with "nestedF" option
specified)
Hope this helps
Ariel./
Pedro L?pez Romero wrote:
>Hi Jim,
>
>
>
>>Not sure I understand your point. Are you saying that a particular
>>contrast that appears to be significant using your method ends up
having
>>a very large p-value if you use nestedF?
>>
>>
>
>
>Not exactly. The problem is that using decideTests(...,
method="nestedF")
>with the whole set of genes (not filtered by any method), some of
the
>selected genes have an adjusted p-value quite large, corresponds to
genes
>with a very small M value.- Theses genes are in the list of
differentially
>exprssed genes selected with decideTests(..., method="nestedF").
>
>For example, here I show you two of the genes for a particular
contrast that
>would be selected using decideTests(...,method="nestedF" )
>
>
> M A t P.Val
adj.P.Val
>1.064610409 13.0019494 4.18633292 0.000240894
0.239943977
>0.489386597 8.228402648 2.86816475 0.007621841
0.817992583
>
>
>I am a bit confused with this result?, Any clue?
>
>Pedro.-
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>