Adding chips to an existing set of normalised data

0

Entering edit mode

Crispin Miller ★ 1.1k

@crispin-miller-264

Last seen 10.3 years ago

Hi! Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! These have now sparked a different question for us: We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper). Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles... Crispin http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml -------------------------------------------------------- This email is confidential and intended solely for the use of th... {{dropped}}

Clustering probe Clustering probe • 1.3k views

ADD COMMENT • link updated 21.6 years ago by Park, Richard ▴ 220 • written 21.6 years ago by Crispin Miller ★ 1.1k

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 10.3 years ago

if your data is decnent what you describe wont be that big an issue, but here are various statergies to solve the problem you describe: 0- keep your cel files and redo everything every time (con: not efficient at all) 1- do rma on probe level. then before any expression level analysis normalize the merged exprsets. (con: you may over-normalize) 2- decide on a "tyical probe level distribution" and alway map to that (con: requires choice of a distribution and some extra coding) 3- use a non-multi array rma (ra?). you bg correct, use a non multichip normalization such as rescaling (can vsn be made mono-chip?) use robust summary, e.g. median, tukey.biweight, etc... (con: under my defition of a good expression measure: it wont be as good as rma but itll be better than mas 5.0) to see how well this does you can put it through affycomp.biostat.jhsph.edu i would rank these stratergies: 2,1,3,0. to pick a typical probe level distribution in strategy 2 i would use as many arrays as possible. i would not use a parametric distribution, such as normal, just for computational convinience. On Wed, 4 Jun 2003, Crispin Miller wrote: > Hi! > Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! > > These have now sparked a different question for us: > We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. > > So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper). > > Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles... > > Crispin > http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml > > -------------------------------------------------------- > > > This email is confidential and intended solely for the use of th... {{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.6 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Hi all Rafa wrote: > 3- use a non-multi array rma (ra?). you bg correct, use a non > multichip normalization such as rescaling (can vsn be made mono- chip?) vsn is a multichip method, it cannot be used on a single chip. With some modification to the code, it could be used to normalize one or several additional chips against an existing batch of chips. Best regards Wolfgang

ADD REPLY • link 21.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Park, Richard ▴ 220

@park-richard-227

Last seen 10.3 years ago

Hi Rafael, I was just wondering if you could give me your opinion on my method of normalization. I was always under the impression that it is best to always renormalize the entire data set whenever you add or remove an additional chip. This would correspond to your 0 method. I do understand that this is the most time consuming method, but I have created a visual basic interface that keeps track of all the .cel files we have for our lab. So, at any point you wish to have a different group of files to analyze, it is a matter of clicking on the data sets you wish to include, and from here we normalize everything together from the .cel files using rma. It is usually a matter of minutes to have everything renormalized together, and we currently have a collection of about 250 affy chips so far that can be combined together in any combination. I thought this was the most precise way of creating normalized data sets, but are the other methods you talked about better and more accurate? Thanks, Richard Park Computational Data Analyzer Joslin Diabetes Center -----Original Message----- From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu] Sent: Wednesday, June 04, 2003 10:53 AM To: Crispin Miller Cc: Bioconductor (E-mail) Subject: Re: [BioC] Adding chips to an existing set of normalised data if your data is decnent what you describe wont be that big an issue, but here are various statergies to solve the problem you describe: 0- keep your cel files and redo everything every time (con: not efficient at all) 1- do rma on probe level. then before any expression level analysis normalize the merged exprsets. (con: you may over-normalize) 2- decide on a "tyical probe level distribution" and alway map to that (con: requires choice of a distribution and some extra coding) 3- use a non-multi array rma (ra?). you bg correct, use a non multichip normalization such as rescaling (can vsn be made mono-chip?) use robust summary, e.g. median, tukey.biweight, etc... (con: under my defition of a good expression measure: it wont be as good as rma but itll be better than mas 5.0) to see how well this does you can put it through affycomp.biostat.jhsph.edu i would rank these stratergies: 2,1,3,0. to pick a typical probe level distribution in strategy 2 i would use as many arrays as possible. i would not use a parametric distribution, such as normal, just for computational convinience. On Wed, 4 Jun 2003, Crispin Miller wrote: > Hi! > Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! > > These have now sparked a different question for us: > We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. > > So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper). > > Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles... > > Crispin > http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml > > -------------------------------------------------------- > > > This email is confidential and intended solely for the use of th... {{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.6 years ago Park, Richard ▴ 220

0

Entering edit mode

i think approach 0 is theoretically the best. the only reason i ranked it as works is because of how time consuming it is. of course, if someone has time and expertise to code a visual basic interface that handles 250 chips in "a matter of minutes" then i would re-rank this approach as my favorite. On Wed, 4 Jun 2003, Park, Richard wrote: > Hi Rafael, > I was just wondering if you could give me your opinion on my method of normalization. I was always under the impression that it is best to always renormalize the entire data set whenever you add or remove an additional chip. This would correspond to your 0 method. I do understand that this is the most time consuming method, but I have created a visual basic interface that keeps track of all the .cel files we have for our lab. > > So, at any point you wish to have a different group of files to analyze, it is a matter of clicking on the data sets you wish to include, and from here we normalize everything together from the .cel files using rma. It is usually a matter of minutes to have everything renormalized together, and we currently have a collection of about 250 affy chips so far that can be combined together in any combination. > > I thought this was the most precise way of creating normalized data sets, but are the other methods you talked about better and more accurate? > > Thanks, > Richard Park > Computational Data Analyzer > Joslin Diabetes Center > > -----Original Message----- > From: Rafael A. Irizarry [mailto:ririzarr@jhsph.edu] > Sent: Wednesday, June 04, 2003 10:53 AM > To: Crispin Miller > Cc: Bioconductor (E-mail) > Subject: Re: [BioC] Adding chips to an existing set of normalised data > > > if your data is decnent what you describe wont be that big an issue, > but here are various statergies to solve the problem you describe: > > 0- keep your cel files and redo everything every time (con: not efficient > at all) > 1- do rma on probe level. then before any expression level analysis > normalize the merged exprsets. (con: you may over-normalize) > 2- decide on a "tyical probe level distribution" and alway map to that > (con: requires choice of a distribution and some extra coding) > 3- use a non-multi array rma (ra?). you bg correct, use a non > multichip normalization such as rescaling (can vsn be made mono- chip?) > use robust summary, e.g. median, tukey.biweight, etc... > (con: under my defition of a good expression measure: it wont be as good > as rma but itll be better than mas 5.0) > to see how well this does you can put it through > affycomp.biostat.jhsph.edu > > i would rank these stratergies: 2,1,3,0. to pick a > typical probe level distribution in strategy 2 i > would use as many arrays as possible. i would not use a parametric > distribution, such as normal, just for computational convinience. > > > On Wed, 4 Jun 2003, > Crispin Miller wrote: > > > Hi! > > Over the last few days we've been learning lots about alternate ways of dealing with low-intesity probesets and some pretty strong arguments in favour of using alternate techniques to deal with these. Firstly, thanks - the discussion has been really helpful and much appreciated! > > > > These have now sparked a different question for us: > > We have an ever-increasing database of affymetrix chips... Currently these have been processed and normalised using MAS5.0. As we add arrays to the set, we can compare between them since the normalisation simply sets them to have the same average intensity. > > > > So the question is, if I am to normalise my data with, RMA say, I get a set of normalised arrays based on statistics generated over the set of chips I normalise - i.e. each array is normalised in the context of its peers, unlike MAS5.0 (as I understand it). This is, I think, due to the a(j) parameter in the RMA model, or phi(j) for dChip which represent the probe affinity effects and can be estimated if we have 'enough arrays' (from Irizarray et al. 2003, NA Res paper). > > > > Now, when we add experiments to the database, are the normalised expression levels calculated for one experimental chip-set comparable to the expression-levels computed for another. if not, do I need to apply RMA over the entire database each time I add a new experiment to it? And is this possible in a reasonable amount of time and memory? If not do people have alternate suggestions? We are particualrly interested in clustering and generation of expression profiles... > > > > Crispin > > http://bioinf.picr.man.ac.uk/mbcf/microarray_ma.shtml > > > > -------------------------------------------------------- > > > > > > This email is confidential and intended solely for the use of th... {{dropped}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 21.6 years ago Rafael A. Irizarry ★ 2.3k

Login before adding your answer.