Combining two datasets - help to use GeneMeta.

0

Entering edit mode

Sharon Anbu ▴ 480

@sharon-anbu-1524

Last seen 10.6 years ago

Hi, I am trying to combine two Affy datasets (on rae230a chips), where experiments done one year apart. In the first dataset, we have 2 strains with each strain treated and untreated. But for the second dataset, we have just 2 strains untreated. Because of unequal levels in the 2 datasets, I am not able to use 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for this situation? or any alternate way of combining these 2 datasets? Thanks in advance. Best regards, Sharon

rae230a affy GeneMeta rae230a affy GeneMeta • 1.8k views

ADD COMMENT • link updated 18.8 years ago by Darlene Goldstein ▴ 230 • written 18.8 years ago by Sharon Anbu ▴ 480

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 7 weeks ago

United States

Sharon wrote: > Hi, > > I am trying to combine two Affy datasets (on rae230a chips), where > experiments done one year apart. In the first dataset, we have 2 > strains with each strain treated and untreated. But for the second > dataset, we have just 2 strains untreated. > > Because of unequal levels in the 2 datasets, I am not able to use > 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for > this situation? or any alternate way of combining these 2 datasets? Are these datasets really that much different that you can't just combine them? They may be, but have you looked at affyPLM results, density plots, etc., just to be sure? If they aren't that much different, perhaps you can just normalize them together and move on? Just asking.... Sean

ADD COMMENT • link 18.8 years ago Sean Davis 21k

0

Entering edit mode

Sean Davis wrote: > Sharon wrote: >> Hi, >> >> I am trying to combine two Affy datasets (on rae230a chips), where >> experiments done one year apart. In the first dataset, we have 2 >> strains with each strain treated and untreated. But for the second >> dataset, we have just 2 strains untreated. >> >> Because of unequal levels in the 2 datasets, I am not able to use >> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for >> this situation? or any alternate way of combining these 2 datasets? > > Are these datasets really that much different that you can't just > combine them? They may be, but have you looked at affyPLM results, > density plots, etc., just to be sure? If they aren't that much > different, perhaps you can just normalize them together and move on? > Just asking.... Sorry, but that is, IMHO, a bad idea. You should never jointly normalize separate experiments. Normalize separately and use a random effects model for the experiments. As, for how to handle different levels of factors/covariates, the issue then becomes one of what can be estimated from both. Once you identify that you can set up the appropriate model and then use tools like nlme and lmer (depending on the model) to estimate parameters. But this will require some statistical expertise and for that you will have to look locally, these things are too hard to do over the internet, IMHO. There is a BioC technical report on Synthesis of microarray experiments that outlines some of these details more completely. best wishes Robert > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 18.8 years ago rgentleman ★ 5.5k

0

Entering edit mode

Robert Could you elaborate a bit on why you think it a bad idea to normalize separate experiments together. If you normalize each experiment separately are you requiring the same conditions in each? Thanks Sincerely, Gordon Senior Research Scientist Developmental Psychobiology NYS Psychiatric Institute Columbia College of Physicians and Surgeons 1051 Riverside Drive New York, New York 10032 212-543-5694 (voice) 212-543-5497 (fax) _____________________________________________________ This e-mail is confidential and may be privileged. Use or disclosure of it by anyone other than a designated addressee is unauthorized. If you are not an intended recipient, please delete this e-mail. "Every gun that is made, every warship launched, every rocket fired, signifies in a final sense a theft from those who hunger and are not fed?those who are cold and are not clothed. This world in arms is not spending its money alone?it is spending the sweat of its laborers, the genius of its scientists, the hopes of its children." ?Dwight David Eisenhower, 1953 On Jun 11, 2006, at 2:23 PM, Robert Gentleman wrote: > > > Sean Davis wrote: >> Sharon wrote: >>> Hi, >>> >>> I am trying to combine two Affy datasets (on rae230a chips), where >>> experiments done one year apart. In the first dataset, we have 2 >>> strains with each strain treated and untreated. But for the second >>> dataset, we have just 2 strains untreated. >>> >>> Because of unequal levels in the 2 datasets, I am not able to use >>> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' >>> for >>> this situation? or any alternate way of combining these 2 datasets? >> >> Are these datasets really that much different that you can't just >> combine them? They may be, but have you looked at affyPLM results, >> density plots, etc., just to be sure? If they aren't that much >> different, perhaps you can just normalize them together and move on? >> Just asking.... > > Sorry, but that is, IMHO, a bad idea. You should never jointly > normalize separate experiments. Normalize separately and use a random > effects model for the experiments. As, for how to handle different > levels of factors/covariates, the issue then becomes one of what > can be > estimated from both. Once you identify that you can set up the > appropriate model and then use tools like nlme and lmer (depending on > the model) to estimate parameters. But this will require some > statistical expertise and for that you will have to look locally, > these > things are too hard to do over the internet, IMHO. > There is a BioC technical report on Synthesis of microarray > experiments that outlines some of these details more completely. > > > best wishes > Robert > >> >> Sean >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor >> > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD REPLY • link 18.8 years ago Gordon Barr ▴ 20

0

Entering edit mode

Hi, A bit, but you probably want to read the paper I referenced, as it has more complete details. I also, ought to emphasize at the outset that this argument is the wrong way around. If you want to do something (such as joint normalization) then it is incumbent on you to state why and under what assumptions it is sensible. I can easily state the ones under which separate normalization followed by a random effects model is appropriate and it is, AFAICS a super set of those where joint normalization would work. Gordon Barr wrote: > Robert > > Could you elaborate a bit on why you think it a bad idea to normalize > separate experiments together. If you normalize each experiment > separately are you requiring the same conditions in each? No, essentially the opposite. Normalization together presumes that the conditions were essentially the same and separate normalization allows them to be different. When they are the same, then separate normalization will almost surely be a bit less efficient (in a statistical sense) and when they are really different joint normalization can be very problematic. Essentially the problem is that normalization presumes things like few genes are differentially expressed, the rank order of the expression values is approximately correct etc, that tend to hold for single experiments but can be quite incorrect for different experiments. Another way of thinking of normalization is that you essentially want to fit a model to Y (the observed spot intensities) and correct for all experimental covariates, X (but none of the biological ones you intend to test for), Y = X b + e and then you throw away the Xb and proceed to analyze the e's. Most of the methods around try to do this without requiring explicit statements of X, but most would undoubtedly be improved if some parts of X could be specified (reagent batch, slide batch, technician, day of week, sample handling etc). Back to the main story: since the X's are very different in two different experiments, there are some real problems that arise from assuming that they are the same. On the other hand, keeping them separate and then using a random effects model seems to be appropriate in all cases and better reflects our belief about the data (at least I have only encountered situations where experiments should be treated as random effects). This stuff works and is appropriate - one only hopes that sooner or later folks will start to realize that just because you can do something does not mean you should. Statistical manipulations of data are merely mathematical transformations, they can always be carried out, the art is in determining when it is sensible to do so and for my money (and that of the people who's data I analyze) joint normalization makes no sense. best wishes Robert > > Thanks > > Sincerely, > > Gordon > > Senior Research Scientist > Developmental Psychobiology > NYS Psychiatric Institute > Columbia College of Physicians and Surgeons > 1051 Riverside Drive > New York, New York 10032 > 212-543-5694 (voice) > 212-543-5497 (fax) > > _____________________________________________________ > This e-mail is confidential and may be privileged. Use or disclosure of > it by anyone other than a designated addressee is unauthorized. If you > are not an intended recipient, please delete this e-mail. > > "Every gun that is made, every warship launched, every rocket fired, > signifies in a final sense a theft from those who hunger and are not > fed?those who are cold and are not clothed. This world in arms is not > spending its money alone?it is spending the sweat of its laborers, the > genius of its scientists, the hopes of its children." > ?Dwight David Eisenhower, 1953 > > > > On Jun 11, 2006, at 2:23 PM, Robert Gentleman wrote: > >> >> >> Sean Davis wrote: >>> Sharon wrote: >>>> Hi, >>>> >>>> I am trying to combine two Affy datasets (on rae230a chips), where >>>> experiments done one year apart. In the first dataset, we have 2 >>>> strains with each strain treated and untreated. But for the second >>>> dataset, we have just 2 strains untreated. >>>> >>>> Because of unequal levels in the 2 datasets, I am not able to use >>>> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for >>>> this situation? or any alternate way of combining these 2 datasets? >>> >>> Are these datasets really that much different that you can't just >>> combine them? They may be, but have you looked at affyPLM results, >>> density plots, etc., just to be sure? If they aren't that much >>> different, perhaps you can just normalize them together and move on? >>> Just asking.... >> >> Sorry, but that is, IMHO, a bad idea. You should never jointly >> normalize separate experiments. Normalize separately and use a random >> effects model for the experiments. As, for how to handle different >> levels of factors/covariates, the issue then becomes one of what can be >> estimated from both. Once you identify that you can set up the >> appropriate model and then use tools like nlme and lmer (depending on >> the model) to estimate parameters. But this will require some >> statistical expertise and for that you will have to look locally, these >> things are too hard to do over the internet, IMHO. >> There is a BioC technical report on Synthesis of microarray >> experiments that outlines some of these details more completely. >> >> >> best wishes >> Robert >> >>> >>> Sean >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> --Robert Gentleman, PhD >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> PO Box 19024 >> Seattle, Washington 98109-1024 >> 206-667-7700 >> rgentlem at fhcrc.org >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 18.8 years ago rgentleman ★ 5.5k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20060612/ 9e422f5d/attachment.pl

ADD REPLY • link 18.8 years ago Sharon Anbu ▴ 480

0

Entering edit mode

Darlene Goldstein ▴ 230

@darlene-goldstein-1004

Last seen 10.6 years ago

Robert Gentleman <rgentlem at="" ...=""> writes: > > > Sean Davis wrote: > > Sharon wrote: > >> Hi, > >> > >> I am trying to combine two Affy datasets (on rae230a chips), where > >> experiments done one year apart. In the first dataset, we have 2 > >> strains with each strain treated and untreated. But for the second > >> dataset, we have just 2 strains untreated. > >> > >> Because of unequal levels in the 2 datasets, I am not able to use > >> 'getdF' in GeneMeta as it is. Any suggestions for using 'getdF' for > >> this situation? or any alternate way of combining these 2 datasets? > > > > Are these datasets really that much different that you can't just > > combine them? They may be, but have you looked at affyPLM results, > > density plots, etc., just to be sure? If they aren't that much > > different, perhaps you can just normalize them together and move on? > > Just asking.... > > Sorry, but that is, IMHO, a bad idea. You should never jointly > normalize separate experiments. Normalize separately and use a random > effects model for the experiments. As, for how to handle different > levels of factors/covariates, the issue then becomes one of what can be > estimated from both. Once you identify that you can set up the > appropriate model and then use tools like nlme and lmer (depending on > the model) to estimate parameters. But this will require some > statistical expertise and for that you will have to look locally, these > things are too hard to do over the internet, IMHO. > There is a BioC technical report on Synthesis of microarray > experiments that outlines some of these details more completely. > > best wishes > Robert > hi, a belated followup on Robert's advice.......it seems to me that the hope with joint normalization is to remove the different 'study batch' effects. I have posted previously on the apparent futility of this: http://article.gmane.org/gmane.science.biology.informatics.conductor/2 578/ I have also posted a preprint of the study on which this advice is based: http://ludwig-sun2.unil.ch/~darlene/ms/MetaChapPreprint.pdf The bottom line is that these kind of study differences always occur, and that you don't remove them with joint normalization. You need to normalize within study and then combine (and there are several suggestions out there for how to do the combining). Best regards, Darlene -- Darlene Goldstein ?cole Polytechnique F?d?rale de Lausanne (EPFL) Institut de math?matiques B?timent MA, Station 8 Tel: +41 21 693 2552 CH-1015 Lausanne Fax: +41 21 693 4303 SWITZERLAND

ADD COMMENT • link 18.8 years ago Darlene Goldstein ▴ 230

Login before adding your answer.