Hello,
Does anybody know a R package or function to compare expression level
(affy data) of two groups with no replicates in each group ? In fact,
just compare one array to an other.
The purpose is to find differentially expressed genes.
We cannot used statistical test (not enougth replicates), but we can
used graphical approach based on scatter plot, and outliers detection
approach.
Thanks for your help,
Regards
Nicolas.
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: Nicolas.Servant at curie.fr
Tel: 01 44 32 42 75
While I agree wholeheartedly with what others state on the issue of
replication and external validation (eg PCR) you might be able to do
slightly better with a test statistic based on a probe level analysis.
Admittedly there is no polished function for doing this in general
right
now, but something like
my.abatch <- ReadAffy()
my.Pset <- fitPLM(my.abatch)
##now assuming you have only two samples
PLM.teststatistic <- (coefs(my.Pset)[,1] -
coefs(my.Pset)[,2])/(sqrt(se(my.Pset)[,1]^2 + se(Pset)[,2]^2)
I have observed that you do slightly better thresholding on this than
FC
(or log FC to be more exact) on spike-in datasets.
Hope that helps,
Ben
On Wed, 2006-01-25 at 14:34 +0100, Nicolas Servant wrote:
> Hello,
>
> Does anybody know a R package or function to compare expression
level
> (affy data) of two groups with no replicates in each group ? In
fact,
> just compare one array to an other.
> The purpose is to find differentially expressed genes.
> We cannot used statistical test (not enougth replicates), but we can
> used graphical approach based on scatter plot, and outliers
detection
> approach.
>
> Thanks for your help,
> Regards
>
> Nicolas.
>
> While I agree wholeheartedly with what others state on the issue of
> replication and external validation (eg PCR) you might be able to do
> slightly better with a test statistic based on a probe level
analysis.
> Admittedly there is no polished function for doing this in general
right
> now, but something like
Wasn't the function 'ppsetApply' in 'affy' meant to be a general
function
to do whatever one likes across probe sets ?
> my.abatch <- ReadAffy()
> my.Pset <- fitPLM(my.abatch)
>
> ##now assuming you have only two samples
>
> PLM.teststatistic <- (coefs(my.Pset)[,1] -
> coefs(my.Pset)[,2])/(sqrt(se(my.Pset)[,1]^2 + se(Pset)[,2]^2)
>
>
> I have observed that you do slightly better thresholding on this
than FC
> (or log FC to be more exact) on spike-in datasets.
>
> Hope that helps,
>
> Ben
>
>
>
>
>
>
> On Wed, 2006-01-25 at 14:34 +0100, Nicolas Servant wrote:
>> Hello,
>>
>> Does anybody know a R package or function to compare expression
level
>> (affy data) of two groups with no replicates in each group ? In
fact,
>> just compare one array to an other.
>> The purpose is to find differentially expressed genes.
>> We cannot used statistical test (not enougth replicates), but we
can
>> used graphical approach based on scatter plot, and outliers
detection
>> approach.
>>
>> Thanks for your help,
>> Regards
>>
>> Nicolas.
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
On Thu, 2006-01-26 at 15:35 +0100, lgautier at altern.org wrote:
> > While I agree wholeheartedly with what others state on the issue
of
> > replication and external validation (eg PCR) you might be able to
do
> > slightly better with a test statistic based on a probe level
analysis.
> > Admittedly there is no polished function for doing this in general
right
> > now, but something like
>
> Wasn't the function 'ppsetApply' in 'affy' meant to be a general
function
> to do whatever one likes across probe sets ?
The generality I was referring to was for carrying out tests based on
output stored in PLMset objects.
Hi Nicolas,
I tried to send this message to the mailing list yesterday but it
seems
not to have got through.
Like others I'd caution against using no replicates at all, but I
think
we have some methods that are useful with no or very few replicates.
Like Ben Bolstad said in a previous message, I would suggest that you
use the probe-level analysis to estimate the technical variance. We
have
a method, multi-mgMOS, which estimates the log-signal and associates
this estimate with credibility intervals (either variances or
percentiles of the posterior log-signal). A paper describing this
method
is available from,
http://bioinformatics.oxfordjournals.org/cgi/content/short/bti583v1
and the most recent code and other publications are on the project
web-site,
http://umber.sbs.man.ac.uk/resources/puma/
An older version of mmgMOS is available in bioconductor.
More recently we have developed a Bayesian t-test, PPLR, which allows
this technical variance to be included in determining differential
expression from replicated conditions. The PPLR code also works with
no
replicates, in which case it just uses the technical variance. R code
is
available from the above website (the paper is submitted and we will
get
PPLR into bioconductor shortly).
I suggest you try mmgMOS, followed by mean or median centering of the
log-signal estimates, followed by PPLR. Let me know how you get on -
these are new methods and feedback is very welcome.
Regards,
Magnus.
--
Dr Magnus Rattray
School of Computer Science,
University of Manchester,
Oxford Road,
Manchester M13 9PL, UK.
Tel. +44-161-275-6187
Fax. +44-161-275-6204
> Hello,
>
> Does anybody know a R package or function to compare expression
level
> (affy data) of two groups with no replicates in each group ? In
fact,
> just compare one array to an other.
> The purpose is to find differentially expressed genes.
> We cannot used statistical test (not enougth replicates), but we
can
> used graphical approach based on scatter plot, and outliers
detection
> approach.
>
> Thanks for your help,
> Regards
>
> Nicolas.
>
On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
> Hello,
>
> Does anybody know a R package or function to compare expression
level
> (affy data) of two groups with no replicates in each group ? In
fact,
> just compare one array to an other.
> The purpose is to find differentially expressed genes.
> We cannot used statistical test (not enougth replicates), but we can
> used graphical approach based on scatter plot, and outliers
detection
> approach.
Simply take array A and divide it by array B. Then rank the genes by
those
ratios.
Sean
Thanks for your answer,
But in this case, i have to choose a fold change threshold ! And it is
supported that the FC tends to be greater at low expression levels.
For instance a FC greater than 2 for expression values near 50 is
readily seen, but it is low probability to observe FC greater than 2
for
expression values near 1000
So i would like to use a more robust approach.
Regards,
Nicolas S.
Sean Davis wrote:
>
>On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
>
>
>
>>Hello,
>>
>>Does anybody know a R package or function to compare expression
level
>>(affy data) of two groups with no replicates in each group ? In
fact,
>>just compare one array to an other.
>>The purpose is to find differentially expressed genes.
>>We cannot used statistical test (not enougth replicates), but we can
>>used graphical approach based on scatter plot, and outliers
detection
>>approach.
>>
>>
>
>Simply take array A and divide it by array B. Then rank the genes by
those
>ratios.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>
>
--
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
Email: Nicolas.Servant at curie.fr
Tel: 01 44 32 42 75
Without replication, there is nothing staatistical that is really
"robust" because you do not know how variable the data are.
In the old industrial design literature, in experiments without
replication, a normal probability plot (qqnorm) or half-normal plot
were used to identify effects which were too large compared to random
normal (which presumably fit most of the effects). You could do
something similar (I would suggest using the quantiles of the t3 or
t4 distribution rather than a normal) but the method requires 2
assumptions that are very unlikely in the current situation: the
responses must be independent (but responses on the same array are
dependent) and the responses must be identically distributed as a
K*t4 distribution, where K is a constant related to the gene-wise
standard deviation - i.e. the SD for all genes must be equal.
There is also the volcano plot, which I have never used, but is based
on similar ideas.
A more robust idea is to use a binary search using PCR and the
observed fold differences. Although given the expense, it would be
simpler to run a replicate for each condition.
--Naomi
At 09:19 AM 1/25/2006, Nicolas Servant wrote:
>Thanks for your answer,
>But in this case, i have to choose a fold change threshold ! And it
is
>supported that the FC tends to be greater at low expression levels.
>For instance a FC greater than 2 for expression values near 50 is
>readily seen, but it is low probability to observe FC greater than 2
for
>expression values near 1000
>So i would like to use a more robust approach.
>
>Regards,
>Nicolas S.
>
>Sean Davis wrote:
>
> >
> >On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
> >
> >
> >
> >>Hello,
> >>
> >>Does anybody know a R package or function to compare expression
level
> >>(affy data) of two groups with no replicates in each group ? In
fact,
> >>just compare one array to an other.
> >>The purpose is to find differentially expressed genes.
> >>We cannot used statistical test (not enougth replicates), but we
can
> >>used graphical approach based on scatter plot, and outliers
detection
> >>approach.
> >>
> >>
> >
> >Simply take array A and divide it by array B. Then rank the genes
by those
> >ratios.
> >
> >Sean
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> >
> >
> >
>
>
>--
>Nicolas Servant
>Equipe Bioinformatique
>Institut Curie
>26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE
>
>Email: Nicolas.Servant at curie.fr
>Tel: 01 44 32 42 75
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
On 1/25/06 9:19 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
> Thanks for your answer,
> But in this case, i have to choose a fold change threshold ! And it
is
> supported that the FC tends to be greater at low expression levels.
> For instance a FC greater than 2 for expression values near 50 is
> readily seen, but it is low probability to observe FC greater than 2
for
> expression values near 1000
> So i would like to use a more robust approach.
Unfortunately, I don't think there is a truly more robust approach
with only
one measurement (the ratio) for each gene. The answer here is to do
more
replicates or to take your gene list as a source of "candidates" that
you
then validate using another technology (PCR, for example). If you
really
need to assign some statistical significance (rather than just ranking
genes
for further analysis), then I think you have no choice but to do
further
arrays.
Sean
Nicolas Servant wrote:
> Thanks for your answer,
> But in this case, i have to choose a fold change threshold ! And it
is
> supported that the FC tends to be greater at low expression levels.
> For instance a FC greater than 2 for expression values near 50 is
> readily seen, but it is low probability to observe FC greater than 2
for
> expression values near 1000
> So i would like to use a more robust approach.
With only two samples, you are stuck with fold changes. However, you
might be able to make your results more robust by filtering out those
genes that you think are too small. I often use kOverA() in the
genefilter package to do this.
Best,
Jim
>
> Regards,
> Nicolas S.
>
> Sean Davis wrote:
>
>
>>On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
>>
>>
>>
>>
>>>Hello,
>>>
>>>Does anybody know a R package or function to compare expression
level
>>>(affy data) of two groups with no replicates in each group ? In
fact,
>>>just compare one array to an other.
>>>The purpose is to find differentially expressed genes.
>>>We cannot used statistical test (not enougth replicates), but we
can
>>>used graphical approach based on scatter plot, and outliers
detection
>>>approach.
>>>
>>>
>>
>>Simply take array A and divide it by array B. Then rank the genes
by those
>>ratios.
>>
>>Sean
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>>
>>
>
>
>
--
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
Hi Nicolas,
> And it is
> supported that the FC tends to be greater at low expression levels.
What is supported is that the variance of the _estimate_ of the FC
(the
true underlying quantity) by the log-ratio of measured probe
intensities
tends to be greater at low expression levels. Indeed this depends on
the
preprocessing and background correction. Consider this paper:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&d
opt=Abstract&list_uids=12169536
and the accompanying "vsn" package in bioC. It removes the
intensity-dependence of the variance, and you can use the "glog-
ratio",
which is an alternative estimator of FC, to select genes. This amounts
to assuming that all genes have the same variance.
Of course the assumption is not really true, there can be gene-
specific
causes for different variances (besides overall intensity). But with
only two arrays you have no way of seeing them. Hence, using glog-
ratio
to select genes when there are no replicates is an extreme version of
the moderated t-statistic (which is often used when there are few
replicates).
Best wishes
Wolfgang
Nicolas Servant wrote:
> Thanks for your answer,
> But in this case, i have to choose a fold change threshold ! And it
is
> supported that the FC tends to be greater at low expression levels.
> For instance a FC greater than 2 for expression values near 50 is
> readily seen, but it is low probability to observe FC greater than 2
for
> expression values near 1000
> So i would like to use a more robust approach.
>
> Regards,
> Nicolas S.
>
> Sean Davis wrote:
>
>
>>On 1/25/06 8:34 AM, "Nicolas Servant" <nicolas.servant at="" curie.fr="">
wrote:
>>
>>
>>
>>
>>>Hello,
>>>
>>>Does anybody know a R package or function to compare expression
level
>>>(affy data) of two groups with no replicates in each group ? In
fact,
>>>just compare one array to an other.
>>>The purpose is to find differentially expressed genes.
>>>We cannot used statistical test (not enougth replicates), but we
can
>>>used graphical approach based on scatter plot, and outliers
detection
>>>approach.
>>>
>>>
>>
>>Simply take array A and divide it by array B. Then rank the genes
by those
>>ratios.
>>
>>Sean
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>>
>>
>
>
>
--
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber
Hi Nicolas,
I recently had to analyse the same type of data. We had only 2 arrays
from rare mRNA (each array contained a pool mRNA from 5 animals). Both
we had only 2 arrays which we wanted to compare. All we could do was
rank the difference of the genes, and take the maximum fold change. We
found the expression value/processing of the probeset values made a
big
different to the number of genes that had a >2 fold difference. When
we
apply a mas5 to call the expression value, we had over 2,700 genes
with
greater than a 2 fold change. When gcRMA was used, 260 genes had a 2
fold difference, and with vsn only 11 genes had a 2 fold difference. I
have lots of details on this analysis if it will help you. We found
most
of the genes that mas5 called different were in the low expression
range, and could not be trusted.
We validated 8 genes which we >2 fold different on both vsn and gcRMA
using RT-PCR. We had excellent correlation in all cases. vsn does
very
slightly "under-estimate" the fold difference. I would definitely
trust
any genes that have a >2 fold difference when using vsn. I would not
trust these if they are called using mas5. The glog transformation is
worth applying particularly in these kinds of analyses. We found the
glog-ratio to be reliable. Of course we have no real idea of the
number
of true positives we missed (false -ve).
By using vsn, and removing the intensity-dependence of the variance.
You
can argue that you have removed the denominator of the T-statistic and
thus comparing the "mean" difference is valid. Of course the mean,
has
an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption,
at
least its gives better basis to your analysis.
The second thing I might consider, is checking for replicate probesets
on the array, if the replicate probesets agree, then you can be more
confident in the result.
Although fold change isn't a good statistical measure, a good variance
estimate can be difficult. We just completed a comparison of feature
selection method (jeffery et al.,) in which we should that at low
number
of replicates (n<5), rankproducts or even fold change can perform as
well as or outperform t-statistic and moderated t-statistic methods,
dependent on the variance structure of the data.
Hope this helps,
Regards
Aedin
--------------------
www.hsph.harvard.edu/researchers/aculhane.html
PDate: Wed, 25 Jan 2006 16:43:51 +0000
From: Wolfgang Huber <huber@ebi.ac.uk>
Subject: Re: [BioC] No replicates and differential analysis !!
To: Nicolas Servant <nicolas.servant at="" curie.fr="">
Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <43D7AAC7.9080401 at ebi.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Nicolas,
> And it is
> supported that the FC tends to be greater at low expression levels.
What is supported is that the variance of the _estimate_ of the FC
(the
true underlying quantity) by the log-ratio of measured probe
intensities
tends to be greater at low expression levels. Indeed this depends on
the
preprocessing and background correction. Consider this paper:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&d
opt=Abstract&list_uids=12169536
and the accompanying "vsn" package in bioC. It removes the
intensity-dependence of the variance, and you can use the "glog-
ratio",
which is an alternative estimator of FC, to select genes. This amounts
to assuming that all genes have the same variance.
Of course the assumption is not really true, there can be gene-
specific
causes for different variances (besides overall intensity). But with
only two arrays you have no way of seeing them. Hence, using glog-
ratio
to select genes when there are no replicates is an extreme version of
the moderated t-statistic (which is often used when there are few
replicates).
Best wishes
Wolfgang
Nicolas Servant wrote:
>> Thanks for your answer,
>> But in this case, i have to choose a fold change threshold ! And it
is
>> supported that the FC tends to be greater at low expression levels.
>> For instance a FC greater than 2 for expression values near 50 is
>> readily seen, but it is low probability to observe FC greater than
2 for
>> expression values near 1000
>> So i would like to use a more robust approach.
>>
>> Regards,
>> Nicolas S.
>
Again, we need to be careful about what is "validated" by PCR.
If the RNA used for PCR were the same samples hybridized to the
arrays, you have validated that the arrays "worked"
technically. (And this is certainly worth knowing.)
But what we usually want to validate is that the genes are
differentially expressed in the population, which can only be
validated by use of an independent sample.
--Naomi
At 11:06 AM 1/26/2006, Aedin Culhane wrote:
>Hi Nicolas,
>I recently had to analyse the same type of data. We had only 2 arrays
>from rare mRNA (each array contained a pool mRNA from 5 animals).
Both
>we had only 2 arrays which we wanted to compare. All we could do was
>rank the difference of the genes, and take the maximum fold change.
We
>found the expression value/processing of the probeset values made a
big
>different to the number of genes that had a >2 fold difference. When
we
>apply a mas5 to call the expression value, we had over 2,700 genes
with
>greater than a 2 fold change. When gcRMA was used, 260 genes had a 2
>fold difference, and with vsn only 11 genes had a 2 fold difference.
I
>have lots of details on this analysis if it will help you. We found
most
>of the genes that mas5 called different were in the low expression
>range, and could not be trusted.
>
>We validated 8 genes which we >2 fold different on both vsn and gcRMA
>using RT-PCR. We had excellent correlation in all cases. vsn does
very
>slightly "under-estimate" the fold difference. I would definitely
trust
>any genes that have a >2 fold difference when using vsn. I would not
>trust these if they are called using mas5. The glog transformation is
>worth applying particularly in these kinds of analyses. We found the
>glog-ratio to be reliable. Of course we have no real idea of the
number
>of true positives we missed (false -ve).
>
>By using vsn, and removing the intensity-dependence of the variance.
You
>can argue that you have removed the denominator of the T-statistic
and
>thus comparing the "mean" difference is valid. Of course the mean,
has
>an n of 1. Thus its just the glog-ratio. Albeit a woolly assumption,
at
>least its gives better basis to your analysis.
>
>The second thing I might consider, is checking for replicate
probesets
>on the array, if the replicate probesets agree, then you can be more
>confident in the result.
>
>Although fold change isn't a good statistical measure, a good
variance
>estimate can be difficult. We just completed a comparison of feature
>selection method (jeffery et al.,) in which we should that at low
number
>of replicates (n<5), rankproducts or even fold change can perform as
>well as or outperform t-statistic and moderated t-statistic methods,
>dependent on the variance structure of the data.
>
>Hope this helps,
>Regards
>Aedin
>--------------------
>www.hsph.harvard.edu/researchers/aculhane.html
>
>
>PDate: Wed, 25 Jan 2006 16:43:51 +0000
>From: Wolfgang Huber <huber at="" ebi.ac.uk="">
>Subject: Re: [BioC] No replicates and differential analysis !!
>To: Nicolas Servant <nicolas.servant at="" curie.fr="">
>Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch="">
>Message-ID: <43D7AAC7.9080401 at ebi.ac.uk>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi Nicolas,
>
> > And it is
> > supported that the FC tends to be greater at low expression
levels.
>
>What is supported is that the variance of the _estimate_ of the FC
(the
>true underlying quantity) by the log-ratio of measured probe
intensities
>tends to be greater at low expression levels. Indeed this depends on
the
>preprocessing and background correction. Consider this paper:
>
>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&
dopt=Abstract&list_uids=12169536
>
>and the accompanying "vsn" package in bioC. It removes the
>intensity-dependence of the variance, and you can use the "glog-
ratio",
>which is an alternative estimator of FC, to select genes. This
amounts
>to assuming that all genes have the same variance.
>
>Of course the assumption is not really true, there can be gene-
specific
>causes for different variances (besides overall intensity). But with
>only two arrays you have no way of seeing them. Hence, using glog-
ratio
>to select genes when there are no replicates is an extreme version of
>the moderated t-statistic (which is often used when there are few
>replicates).
>
>Best wishes
>Wolfgang
>
>
>
>
>Nicolas Servant wrote:
>
> >> Thanks for your answer,
> >> But in this case, i have to choose a fold change threshold ! And
it is
> >> supported that the FC tends to be greater at low expression
levels.
> >> For instance a FC greater than 2 for expression values near 50 is
> >> readily seen, but it is low probability to observe FC greater
than 2 for
> >> expression values near 1000
> >> So i would like to use a more robust approach.
> >>
> >> Regards,
> >> Nicolas S.
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Hi Naomi
I completely agree with you. I think I was glad to find that anything
agreed with an n=1. So we first "validated" (I agree its a bad word)
that the results from the arrays made sense using RT-PCR. Then we have
followed the findings, and validated using a different in vivo
experiemental approach.
Aedin
Naomi Altman wrote:
> Again, we need to be careful about what is "validated" by PCR.
>
> If the RNA used for PCR were the same samples hybridized to the
> arrays, you have validated that the arrays "worked" technically.
(And
> this is certainly worth knowing.)
>
> But what we usually want to validate is that the genes are
> differentially expressed in the population, which can only be
> validated by use of an independent sample.
>
> --Naomi
>
> At 11:06 AM 1/26/2006, Aedin Culhane wrote:
>
>> Hi Nicolas,
>> I recently had to analyse the same type of data. We had only 2
arrays
>> from rare mRNA (each array contained a pool mRNA from 5 animals).
Both
>> we had only 2 arrays which we wanted to compare. All we could do
was
>> rank the difference of the genes, and take the maximum fold change.
We
>> found the expression value/processing of the probeset values made a
big
>> different to the number of genes that had a >2 fold difference.
When we
>> apply a mas5 to call the expression value, we had over 2,700 genes
with
>> greater than a 2 fold change. When gcRMA was used, 260 genes had a
2
>> fold difference, and with vsn only 11 genes had a 2 fold
difference. I
>> have lots of details on this analysis if it will help you. We found
most
>> of the genes that mas5 called different were in the low expression
>> range, and could not be trusted.
>>
>> We validated 8 genes which we >2 fold different on both vsn and
gcRMA
>> using RT-PCR. We had excellent correlation in all cases. vsn does
very
>> slightly "under-estimate" the fold difference. I would definitely
trust
>> any genes that have a >2 fold difference when using vsn. I would
not
>> trust these if they are called using mas5. The glog transformation
is
>> worth applying particularly in these kinds of analyses. We found
the
>> glog-ratio to be reliable. Of course we have no real idea of the
number
>> of true positives we missed (false -ve).
>>
>> By using vsn, and removing the intensity-dependence of the
variance. You
>> can argue that you have removed the denominator of the T-statistic
and
>> thus comparing the "mean" difference is valid. Of course the mean,
has
>> an n of 1. Thus its just the glog-ratio. Albeit a woolly
assumption, at
>> least its gives better basis to your analysis.
>>
>> The second thing I might consider, is checking for replicate
probesets
>> on the array, if the replicate probesets agree, then you can be
more
>> confident in the result.
>>
>> Although fold change isn't a good statistical measure, a good
variance
>> estimate can be difficult. We just completed a comparison of
feature
>> selection method (jeffery et al.,) in which we should that at low
number
>> of replicates (n<5), rankproducts or even fold change can perform
as
>> well as or outperform t-statistic and moderated t-statistic
methods,
>> dependent on the variance structure of the data.
>>
>> Hope this helps,
>> Regards
>> Aedin
>> --------------------
>> www.hsph.harvard.edu/researchers/aculhane.html
>>
>>
>> PDate: Wed, 25 Jan 2006 16:43:51 +0000
>> From: Wolfgang Huber <huber at="" ebi.ac.uk="">
>> Subject: Re: [BioC] No replicates and differential analysis !!
>> To: Nicolas Servant <nicolas.servant at="" curie.fr="">
>> Cc: Bioconductor <bioconductor at="" stat.math.ethz.ch="">
>> Message-ID: <43D7AAC7.9080401 at ebi.ac.uk>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Hi Nicolas,
>>
>> > And it is
>> > supported that the FC tends to be greater at low expression
levels.
>>
>> What is supported is that the variance of the _estimate_ of the FC
(the
>> true underlying quantity) by the log-ratio of measured probe
intensities
>> tends to be greater at low expression levels. Indeed this depends
on the
>> preprocessing and background correction. Consider this paper:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubme
d&dopt=Abstract&list_uids=12169536
>>
>>
>> and the accompanying "vsn" package in bioC. It removes the
>> intensity-dependence of the variance, and you can use the "glog-
ratio",
>> which is an alternative estimator of FC, to select genes. This
amounts
>> to assuming that all genes have the same variance.
>>
>> Of course the assumption is not really true, there can be gene-
specific
>> causes for different variances (besides overall intensity). But
with
>> only two arrays you have no way of seeing them. Hence, using glog-
ratio
>> to select genes when there are no replicates is an extreme version
of
>> the moderated t-statistic (which is often used when there are few
>> replicates).
>>
>> Best wishes
>> Wolfgang
>>
>>
>>
>>
>> Nicolas Servant wrote:
>>
>> >> Thanks for your answer,
>> >> But in this case, i have to choose a fold change threshold ! And
>> it is
>> >> supported that the FC tends to be greater at low expression
levels.
>> >> For instance a FC greater than 2 for expression values near 50
is
>> >> readily seen, but it is low probability to observe FC greater
than
>> 2 for
>> >> expression values near 1000
>> >> So i would like to use a more robust approach.
>> >>
>> >> Regards,
>> >> Nicolas S.
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111