Hi,
I am trying to combine two Affy datasets (on rae230a chips), where
experiments done one year apart. In the first dataset, we have 2
strains with each strain treated and untreated. But for the second
dataset, we have just 2 strains untreated.
Because of unequal levels in the 2 datasets, I am not able to use
'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for
this situation? or any alternate way of combining these 2 datasets?
Thanks in advance.
Best regards,
Sharon
Sharon wrote:
> Hi,
>
> I am trying to combine two Affy datasets (on rae230a chips), where
> experiments done one year apart. In the first dataset, we have 2
> strains with each strain treated and untreated. But for the second
> dataset, we have just 2 strains untreated.
>
> Because of unequal levels in the 2 datasets, I am not able to use
> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF'
for
> this situation? or any alternate way of combining these 2 datasets?
Are these datasets really that much different that you can't just
combine them? They may be, but have you looked at affyPLM results,
density plots, etc., just to be sure? If they aren't that much
different, perhaps you can just normalize them together and move on?
Just asking....
Sean
Sean Davis wrote:
> Sharon wrote:
>> Hi,
>>
>> I am trying to combine two Affy datasets (on rae230a chips), where
>> experiments done one year apart. In the first dataset, we have 2
>> strains with each strain treated and untreated. But for the second
>> dataset, we have just 2 strains untreated.
>>
>> Because of unequal levels in the 2 datasets, I am not able to use
>> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF'
for
>> this situation? or any alternate way of combining these 2
datasets?
>
> Are these datasets really that much different that you can't just
> combine them? They may be, but have you looked at affyPLM results,
> density plots, etc., just to be sure? If they aren't that much
> different, perhaps you can just normalize them together and move on?
> Just asking....
Sorry, but that is, IMHO, a bad idea. You should never jointly
normalize separate experiments. Normalize separately and use a random
effects model for the experiments. As, for how to handle different
levels of factors/covariates, the issue then becomes one of what can
be
estimated from both. Once you identify that you can set up the
appropriate model and then use tools like nlme and lmer (depending on
the model) to estimate parameters. But this will require some
statistical expertise and for that you will have to look locally,
these
things are too hard to do over the internet, IMHO.
There is a BioC technical report on Synthesis of microarray
experiments that outlines some of these details more completely.
best wishes
Robert
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Robert
Could you elaborate a bit on why you think it a bad idea to normalize
separate experiments together. If you normalize each experiment
separately are you requiring the same conditions in each?
Thanks
Sincerely,
Gordon
Senior Research Scientist
Developmental Psychobiology
NYS Psychiatric Institute
Columbia College of Physicians and Surgeons
1051 Riverside Drive
New York, New York 10032
212-543-5694 (voice)
212-543-5497 (fax)
_____________________________________________________
This e-mail is confidential and may be privileged. Use or disclosure
of it by anyone other than a designated addressee is unauthorized.
If you are not an intended recipient, please delete this e-mail.
"Every gun that is made, every warship launched, every rocket fired,
signifies in a final sense a theft from those who hunger and are not
fed?those who are cold and are not clothed. This world in arms is not
spending its money alone?it is spending the sweat of its laborers,
the genius of its scientists, the hopes of its children."
?Dwight David Eisenhower, 1953
On Jun 11, 2006, at 2:23 PM, Robert Gentleman wrote:
>
>
> Sean Davis wrote:
>> Sharon wrote:
>>> Hi,
>>>
>>> I am trying to combine two Affy datasets (on rae230a chips), where
>>> experiments done one year apart. In the first dataset, we have 2
>>> strains with each strain treated and untreated. But for the
second
>>> dataset, we have just 2 strains untreated.
>>>
>>> Because of unequal levels in the 2 datasets, I am not able to use
>>> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF'
>>> for
>>> this situation? or any alternate way of combining these 2
datasets?
>>
>> Are these datasets really that much different that you can't just
>> combine them? They may be, but have you looked at affyPLM results,
>> density plots, etc., just to be sure? If they aren't that much
>> different, perhaps you can just normalize them together and move
on?
>> Just asking....
>
> Sorry, but that is, IMHO, a bad idea. You should never jointly
> normalize separate experiments. Normalize separately and use a
random
> effects model for the experiments. As, for how to handle different
> levels of factors/covariates, the issue then becomes one of what
> can be
> estimated from both. Once you identify that you can set up the
> appropriate model and then use tools like nlme and lmer (depending
on
> the model) to estimate parameters. But this will require some
> statistical expertise and for that you will have to look locally,
> these
> things are too hard to do over the internet, IMHO.
> There is a BioC technical report on Synthesis of microarray
> experiments that outlines some of these details more completely.
>
>
> best wishes
> Robert
>
>>
>> Sean
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/
>> gmane.science.biology.informatics.conductor
>>
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
Hi,
A bit, but you probably want to read the paper I referenced, as it
has
more complete details. I also, ought to emphasize at the outset that
this argument is the wrong way around. If you want to do something
(such
as joint normalization) then it is incumbent on you to state why and
under what assumptions it is sensible. I can easily state the ones
under
which separate normalization followed by a random effects model is
appropriate and it is, AFAICS a super set of those where joint
normalization would work.
Gordon Barr wrote:
> Robert
>
> Could you elaborate a bit on why you think it a bad idea to
normalize
> separate experiments together. If you normalize each experiment
> separately are you requiring the same conditions in each?
No, essentially the opposite. Normalization together presumes that
the
conditions were essentially the same and separate normalization allows
them to be different. When they are the same, then separate
normalization will almost surely be a bit less efficient (in a
statistical sense) and when they are really different joint
normalization can be very problematic.
Essentially the problem is that normalization presumes things like
few
genes are differentially expressed, the rank order of the expression
values is approximately correct etc, that tend to hold for single
experiments but can be quite incorrect for different experiments.
Another way of thinking of normalization is that you essentially
want
to fit a model to Y (the observed spot intensities) and correct for
all
experimental covariates, X (but none of the biological ones you intend
to test for),
Y = X b + e
and then you throw away the Xb and proceed to analyze the e's.
Most of the methods around try to do this without requiring explicit
statements of X, but most would undoubtedly be improved if some parts
of
X could be specified (reagent batch, slide batch, technician, day of
week, sample handling etc).
Back to the main story: since the X's are very different in two
different experiments, there are some real problems that arise from
assuming that they are the same.
On the other hand, keeping them separate and then using a random
effects model seems to be appropriate in all cases and better reflects
our belief about the data (at least I have only encountered situations
where experiments should be treated as random effects). This stuff
works
and is appropriate - one only hopes that sooner or later folks will
start to realize that just because you can do something does not mean
you should. Statistical manipulations of data are merely mathematical
transformations, they can always be carried out, the art is in
determining when it is sensible to do so and for my money (and that of
the people who's data I analyze) joint normalization makes no sense.
best wishes
Robert
>
> Thanks
>
> Sincerely,
>
> Gordon
>
> Senior Research Scientist
> Developmental Psychobiology
> NYS Psychiatric Institute
> Columbia College of Physicians and Surgeons
> 1051 Riverside Drive
> New York, New York 10032
> 212-543-5694 (voice)
> 212-543-5497 (fax)
>
> _____________________________________________________
> This e-mail is confidential and may be privileged. Use or
disclosure of
> it by anyone other than a designated addressee is unauthorized. If
you
> are not an intended recipient, please delete this e-mail.
>
> "Every gun that is made, every warship launched, every rocket fired,
> signifies in a final sense a theft from those who hunger and are not
> fed?those who are cold and are not clothed. This world in arms is
not
> spending its money alone?it is spending the sweat of its laborers,
the
> genius of its scientists, the hopes of its children."
> ?Dwight David Eisenhower, 1953
>
>
>
> On Jun 11, 2006, at 2:23 PM, Robert Gentleman wrote:
>
>>
>>
>> Sean Davis wrote:
>>> Sharon wrote:
>>>> Hi,
>>>>
>>>> I am trying to combine two Affy datasets (on rae230a chips),
where
>>>> experiments done one year apart. In the first dataset, we have 2
>>>> strains with each strain treated and untreated. But for the
second
>>>> dataset, we have just 2 strains untreated.
>>>>
>>>> Because of unequal levels in the 2 datasets, I am not able to use
>>>> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF'
for
>>>> this situation? or any alternate way of combining these 2
datasets?
>>>
>>> Are these datasets really that much different that you can't just
>>> combine them? They may be, but have you looked at affyPLM
results,
>>> density plots, etc., just to be sure? If they aren't that much
>>> different, perhaps you can just normalize them together and move
on?
>>> Just asking....
>>
>> Sorry, but that is, IMHO, a bad idea. You should never jointly
>> normalize separate experiments. Normalize separately and use a
random
>> effects model for the experiments. As, for how to handle different
>> levels of factors/covariates, the issue then becomes one of what
can be
>> estimated from both. Once you identify that you can set up the
>> appropriate model and then use tools like nlme and lmer (depending
on
>> the model) to estimate parameters. But this will require some
>> statistical expertise and for that you will have to look locally,
these
>> things are too hard to do over the internet, IMHO.
>> There is a BioC technical report on Synthesis of microarray
>> experiments that outlines some of these details more completely.
>>
>>
>> best wishes
>> Robert
>>
>>>
>>> Sean
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
Robert Gentleman <rgentlem at="" ...=""> writes:
>
>
> Sean Davis wrote:
> > Sharon wrote:
> >> Hi,
> >>
> >> I am trying to combine two Affy datasets (on rae230a chips),
where
> >> experiments done one year apart. In the first dataset, we have 2
> >> strains with each strain treated and untreated. But for the
second
> >> dataset, we have just 2 strains untreated.
> >>
> >> Because of unequal levels in the 2 datasets, I am not able to use
> >> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF'
for
> >> this situation? or any alternate way of combining these 2
datasets?
> >
> > Are these datasets really that much different that you can't just
> > combine them? They may be, but have you looked at affyPLM
results,
> > density plots, etc., just to be sure? If they aren't that much
> > different, perhaps you can just normalize them together and move
on?
> > Just asking....
>
> Sorry, but that is, IMHO, a bad idea. You should never jointly
> normalize separate experiments. Normalize separately and use a
random
> effects model for the experiments. As, for how to handle different
> levels of factors/covariates, the issue then becomes one of what can
be
> estimated from both. Once you identify that you can set up the
> appropriate model and then use tools like nlme and lmer (depending
on
> the model) to estimate parameters. But this will require some
> statistical expertise and for that you will have to look locally,
these
> things are too hard to do over the internet, IMHO.
> There is a BioC technical report on Synthesis of microarray
> experiments that outlines some of these details more completely.
>
> best wishes
> Robert
>
hi, a belated followup on Robert's advice.......it seems to me that
the hope
with joint normalization is to remove the different 'study batch'
effects. I
have posted previously on the apparent futility of this:
http://article.gmane.org/gmane.science.biology.informatics.conductor/2
578/
I have also posted a preprint of the study on which this advice is
based:
http://ludwig-sun2.unil.ch/~darlene/ms/MetaChapPreprint.pdf
The bottom line is that these kind of study differences always occur,
and that
you don't remove them with joint normalization. You need to normalize
within
study and then combine (and there are several suggestions out there
for how to
do the combining).
Best regards,
Darlene
--
Darlene Goldstein
?cole Polytechnique F?d?rale de Lausanne (EPFL)
Institut de math?matiques
B?timent MA, Station 8 Tel: +41 21 693 2552
CH-1015 Lausanne Fax: +41 21 693 4303
SWITZERLAND