This must be of interest for those preprocessing data from affymetrix
chips. We have compared RMA vs VSN performing an lme-ANOVA. If you are
wondering what to use RMA or VSN? or what are the potential pitfalls
or
benefits from using either normalization or background data correction
approach. Then, please read below and make your own conclusions.
Thank you to the enlightening discussion followed up with colleagues
at
the Bioconductor group.
Roger
Roger L. Vallejo, Ph.D.
Assist. Professor of Genomics & Bioinformatics
Genomics & Bioinformatics Laboratory
Department of Dairy & Animal Science
The Pennsylvania State University
305 Henning Building
University Park, PA 16802
Phone: (814) 865-1846
Email: rvallejo@psu.edu
-----Original Message-----
From: Rafael Irizarry [mailto:ririzarr@jhsph.edu]
Sent: Monday, June 21, 2004 2:12 PM
To: Roger Vallejo
Subject: Re: [BioC] RMA vs VSN
you should consider posting something on the bioc list. i think this
may help many others.
On Jun 21, 2004, at 11:35 AM, Roger Vallejo wrote:
> Dear Rafael,
> I am glad that I asked this question on RMA and VSN. Your comments
> below
> are true. I have quickly checked outputs from LME-ANOVA using data
> preprocessed separately with RMA and VSN. Indeed several
interesting
> genes detected with RMA as significant ones are not detected or
missed
> with VSN. Also generally the P-values for those significant genes
are
> more striking when using RMA than VSN. God knows what else I might
be
> missing by using VSN, because I am just checking for those genes
that
> we
> know are related to immune and inflammatory events. So the rate of
> false undiscoveries is increased with vsn at expense of slightly
lower
> FDR. I would rather maximize gene discovery at slightly higher and
> acceptable FDR.
> Thanks for the excellent point!
> Roger
>
>
> Roger L. Vallejo, Ph.D.
> Assist. Professor of Genomics & Bioinformatics
> Genomics & Bioinformatics Laboratory
> Department of Dairy & Animal Science
> The Pennsylvania State University
> 305 Henning Building
> University Park, PA 16802
> Phone: (814) 865-1846
> Email: rvallejo@psu.edu
>
> -----Original Message-----
> From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu]
> Sent: Saturday, June 19, 2004 3:58 PM
> To: Roger Vallejo
> Cc: rafa@jhu.edu
> Subject: RE: [BioC] RMA vs VSN
>
> i believe the difference does not come from the vsn but from
> background=FALSE. try
>
> eset <- expresso(Data,bg.correct=FALSE,
> normalize.method="quantiles", pmcorrect.method="pmonly",
> summary.method="medianpolish")
>
> i suspect you will get similar results.
>
> when you do not bg correct the variance level for low expressed
genes
> is
> much smaller. but also the estimates of fold change get attenuated.
> false
> discoveries are lower but false "undiscoveries" increase.
>
> On Sat, 19
> Jun 2004, Roger Vallejo wrote:
>
>> Dear Rafael,
>> Thank you very much for your comments.
>> Our results are somewhat different for VSN vs. RMA. If they were
> similar
>> likely I could have kept using RMA because it is part of our
standard
>> array data preprocessing functions. The p-values are smaller and
> thereby
>> the PER and FDR are slightly more acceptable (although not much)
when
>> using p-values from VSN normalization and lme-anova. I would like
to
>> make sure that if deciding to use VSN in the way that I indicated
>> (please see the functions below), I am not over-normalizing my data
as
>> you indicated and most important that I am using a data
normalization
>> fucntion that is as good as RMA. Thanks for your comments.
>> Roger
>>
>> Roger L. Vallejo, Ph.D.
>> Assist. Professor of Genomics & Bioinformatics
>> Genomics & Bioinformatics Laboratory
>> Department of Dairy & Animal Science
>> The Pennsylvania State University
>> 305 Henning Building
>> University Park, PA 16802
>> Phone: (814) 865-1846
>> Email: rvallejo@psu.edu
>>
>> -----Original Message-----
>> From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu]
>> Sent: Saturday, June 19, 2004 2:05 PM
>> To: Roger Vallejo
>> Cc: bioconductor@stat.math.ethz.ch
>> Subject: Re: [BioC] RMA vs VSN
>>
>> vsn and rma are not competitors. the first is a normalization
>> technique, the second is a way to obtain expression measures from
affy
>> arrays which includes background adjustment, normaliztion, and
>> summarization. rma uses quantile normalization as a default.
>> changing this to vsn yields, in general, very similar results.
>>
>> notice, some use rma to obtain an expression measure and then
>> use vsn to nromalize that, although i worry this could result in
>> over-normalization.
>>
>> On Sat, 19 Jun 2004, Roger Vallejo wrote:
>>
>>> We have a small experiment with high FDR (around 0.40): 8
affymetrix
>>> mouse genechips with 22k genes, 2 replications, saline and E. coli
>>> treated mammary tissue, evaluated at 24 hr and 48 hr post
> injections.
>>>
>>> I have run both data preprocessing functions via expresso. To
>>> subsequenctly run an lme-ANOVA. As expected, I got lower FDR and
> much
>>> smaller p-values when using VSN. The FDR was estimated using
QVALUE
>>> package. Obviously, I feel tempted to use VSN instead of RMA.
> However,
>>> before proceeding I would like to hear some comments from the
>>> Bioconductor group on this approach. The question is:
>>>
>>> Is VSN better than RMA?
>>>
>>> I have read the literature and both claim to be the function to be
>> used!
>>>
>>>
>>> Personally, I feel more towards the use of VSN. I might be wrong,
so
> I
>>> would appreciate any suggestions or comments on this.
>>>
>>> These are the functions that I used:
>>>
>>> *************************************************************
>>>
>>> For RMA:
>>>
>>>> library(affy)
>>>
>>>> Data <- ReadAffy(widget=TRUE)
>>>
>>>
>>>> eset <- expresso(Data,bgcorrect.method="rma",
>>> normalize.method="quantiles", pmcorrect.method="pmonly",
>>> summary.method="medianpolish")
>>>
>>>
>>>
>>>
********************************************************************
>>>
>>> For VSN:
>>>
>>>> library(affy)
>>>
>>>> Data <- ReadAffy(widget=TRUE)
>>
>>>
>>>> library(vsn)
>>>
>>>> normalize.AffyBatch.methods <- c(normalize.AffyBatch.methods,
> "vsn")
>>>
>>>> eset = expresso(Data, bg.correct= FALSE, normalize.method =
"vsn",
>>> pmcorrect.method = "pmonly", summary.method = "medianpolish")
>>>
>>>
>>>
>>>
>>
>
**********************************************************************
*
> *
>>> ************************************
>>>
>>> Thank you very much.
>>>
>>> Roger
>>>
>>>
>>>
>>>
>>>
>>> Roger L. Vallejo, Ph.D.
>>>
>>> Assist. Professor of Genomics & Bioinformatics
>>>
>>> Genomics & Bioinformatics Laboratory
>>>
>>> Department of Dairy & Animal Science
>>>
>>> The Pennsylvania State University
>>>
>>> 305 Henning Building
>>>
>>> University Park, PA 16802
>>>
>>> Phone: (814) 865-1846
>>>
>>> Email: rvallejo@psu.edu
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@stat.math.ethz.ch
>>>
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>>
>>
>>
>
[[alternative HTML version deleted]]