split arrays

0

Entering edit mode

scholz@Ag.arizona.edu ▴ 130

@scholzagarizonaedu-1369

Last seen 10.2 years ago

Gordon, Recently you advised someone with a split set of maize arrays that they could do their analysis by reading all the A slides into an RGList and normalizing, then doing the same with the B slides, and then combining the two datasets via rbind() of the two MAList objects. I have a similar (the same?) set of arrays and some of the users of these arrays have noted that the A and B slides perform differently, i.e. more background on the B slide, for whatever reason. Though I'm not actually convinced this is true, it makes me wonder whether the two datasets should be combined at all since there may be a "between array set" source of variation. Am I right to segregate these sets or is there some overwhelming benefit to combining them? I'm no statistician and would appreciate your take. Thanks, Matt --------------------------------------------- College of Agriculture and Life Sciences Web Mail. http://ag.arizona.edu

• 1.1k views

ADD COMMENT • link 19.2 years ago scholz@Ag.arizona.edu ▴ 130

0

Entering edit mode

Jamain, Adrien J ▴ 110

@jamain-adrien-j-1300

Last seen 10.2 years ago

Matt, I am not familiar with the maize arrays, but I am using the following procedure for Affymetrix moe430 split arrays, which have ~160 probesets in common between A and B: 1) background-correct each chip separately at probe-level 2) get a measure of expression at probeset-level 3) plot the common probesets against each other for each pair of each chips. If you observe the same thing as me, you will see that the trend is linear but with intercept != 0 and slope != 1. 4) scale the B chip with those estimated intercept and slope Steps 1 and 2 are easily done with rma( , normalize=F). Wolfgang Huber and I are currently writing a little package which does steps 3 and 4 automatically. I'm not sure whether this procedure could make sense or be adapted somehow to your maize arrays (do they have enough probes in common?), but anyway, some food for thoughts... Adrien > Gordon, > > Recently you advised someone with a split set of maize arrays > that they could do their analysis by reading all the A slides > into an RGList and normalizing, then doing the same with the > B slides, and then combining the two datasets via > rbind() of the two MAList objects. I have a similar (the > same?) set of arrays and some of the users of these arrays > have noted that the A and B slides perform differently, i.e. > more background on the B slide, for whatever reason. Though > I'm not actually convinced this is true, it makes me wonder > whether the two datasets should be combined at all since > there may be a "between array set" > source of variation. Am I right to segregate these sets or is > there some overwhelming benefit to combining them? I'm no > statistician and would appreciate your take. > > Thanks, > > Matt

ADD COMMENT • link 19.2 years ago Jamain, Adrien J ▴ 110

0

Entering edit mode

scholz@Ag.arizona.edu ▴ 130

@scholzagarizonaedu-1369

Last seen 10.2 years ago

Adrien, Thanks for this response. Unfortunately, there are no oligos in common between the two arrays. If anyone else has a response to my question (below), I'd like to hear it. Matt Matt, I am not familiar with the maize arrays, but I am using the following procedure for Affymetrix moe430 split arrays, which have ~160 probesets in common between A and B: 1) background-correct each chip separately at probe-level 2) get a measure of expression at probeset-level 3) plot the common probesets against each other for each pair of each chips. If you observe the same thing as me, you will see that the trend is linear but with intercept != 0 and slope != 1. 4) scale the B chip with those estimated intercept and slope Steps 1 and 2 are easily done with rma( , normalize=F). Wolfgang Huber and I are currently writing a little package which does steps 3 and 4 automatically. I'm not sure whether this procedure could make sense or be adapted somehow to your maize arrays (do they have enough probes in common?), but anyway, some food for thoughts... Adrien > Gordon, > > Recently you advised someone with a split set of maize arrays > that they could do their analysis by reading all the A slides > into an RGList and normalizing, then doing the same with the > B slides, and then combining the two datasets via > rbind() of the two MAList objects. I have a similar (the > same?) set of arrays and some of the users of these arrays > have noted that the A and B slides perform differently, i.e. > more background on the B slide, for whatever reason. Though > I'm not actually convinced this is true, it makes me wonder > whether the two datasets should be combined at all since > there may be a "between array set" > source of variation. Am I right to segregate these sets or is > there some overwhelming benefit to combining them? I'm no > statistician and would appreciate your take. > > Thanks, > > Matt --------------------------------------------- College of Agriculture and Life Sciences Web Mail. http://ag.arizona.edu

ADD COMMENT • link 19.2 years ago scholz@Ag.arizona.edu ▴ 130

0

Entering edit mode

Hi, I am not sure what you are really asking but here goes. References and corresponding R/Bioconductor packages are listed below. In my opinion separate normalization and expression estimation is essential for different experiments (and by experiment I mean a collection of identical arrays processed at about the same time by about the same people using about the same protocol; and by identical arrays I mean from the same batch). While one can often do fancy things to align different arrays prior to processing them it does not seem like a good idea at all. When it works, so would separate normalization and when it does not work you won't know. After you have normalized and estimated expression values then you have the gene matching problem. This is not tivial, there are papers around that discuss this (Parmigiani et al). There are some issues regarding whether you want to make inference at the gene level or the sequence level (Unigene is not the same as Entrez Gene). While many have ignored the issues that arise (even on a single chip) where the same gene has been probed via several different methods, that does not seem to be a "best practices". If you have no common genes, then life is somewhat easier, you just have a bunch more features, and the suggestion to simply use rbind seems pretty sensible to me, although there are some potential pitfalls and you might want to do some checking to ensure that one set of features is not dominating the other for reasons that are not biological. If you do have genes in common, then life is harder, the models are more complicated and IMHO you want to spend a few hours with a local statistician sorting out what questions you want to ask. Essentially, considering what the right model is, on a per gene basis is a pretty good starting point. As I said there are some papers (Choi et al, Gentleman et al), sometimes they come under the heading of meta-analysis, and other times simply random effects models. For the more statistically inclined I recommend the book by Solomon and Cox which directly addresses issues regarding combining microarray experiments. Best wishes, Robert G. Parmigiani, E. Garrett-Mayer, R. Anbazhagan, et al. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clincal Cancer Research, 10:2922?2927, 2004. R package: MergeMaid J. K. Choi, U. Yu, S. Kim, et al. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics, 19, Suppl. 1:i84?i90, 2003. R package: GeneMeta D.R. Cox and P. J. Solomon. Components of Variance. Chapman and Hall, New York, 2003. On the Synthesis of Microarray Experiments R. Gentleman, M. Ruschhaupt and W. Huber, R package: GeneMetaEx scholz at Ag.arizona.edu wrote: > Adrien, > > Thanks for this response. Unfortunately, there are no oligos in common between > the two arrays. If anyone else has a response to my question (below), I'd like > to hear it. > > Matt > > > Matt, > > I am not familiar with the maize arrays, but I am using the following > procedure for Affymetrix moe430 split arrays, which have ~160 probesets > in common between A and B: > 1) background-correct each chip separately at probe-level > 2) get a measure of expression at probeset-level > 3) plot the common probesets against each other for each pair of each > chips. If you observe the same thing as me, you will see that the trend > is linear but with intercept != 0 and slope != 1. > 4) scale the B chip with those estimated intercept and slope > > Steps 1 and 2 are easily done with rma( , normalize=F). > Wolfgang Huber and I are currently writing a little package which does > steps 3 and 4 automatically. > > I'm not sure whether this procedure could make sense or be adapted > somehow to your maize arrays (do they have enough probes in common?), > but anyway, some food for thoughts... > > Adrien > > >>Gordon, >> >>Recently you advised someone with a split set of maize arrays >>that they could do their analysis by reading all the A slides >>into an RGList and normalizing, then doing the same with the >>B slides, and then combining the two datasets via >>rbind() of the two MAList objects. I have a similar (the >>same?) set of arrays and some of the users of these arrays >>have noted that the A and B slides perform differently, i.e. >>more background on the B slide, for whatever reason. Though >>I'm not actually convinced this is true, it makes me wonder >>whether the two datasets should be combined at all since >>there may be a "between array set" >>source of variation. Am I right to segregate these sets or is >>there some overwhelming benefit to combining them? I'm no >>statistician and would appreciate your take. >> >>Thanks, >> >> > > Matt > > --------------------------------------------- > College of Agriculture and Life Sciences Web Mail. > http://ag.arizona.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 19.2 years ago rgentleman ★ 5.5k

0

Entering edit mode

scholz@Ag.arizona.edu ▴ 130

@scholzagarizonaedu-1369

Last seen 10.2 years ago

Thanks, Robert. If I am understanding you correctly, you would advocate both separate normalization AND separate linear modeling in the case where the two arrays come from different batches and have no common probeset, correct? If I was reading Gordon's reply to the other gentleman's email correctly, he was suggesting separate normalization but not separate linear modeling for the datasets. My question, which in retrospect was unclear, was about what the advantages/disadvantages were to combining/separating the datasets for linear modeling. Matt > Hi, > I am not sure what you are really asking but here goes. > References and corresponding R/Bioconductor packages are listed below. > > In my opinion separate normalization and expression estimation is > essential for different experiments (and by experiment I mean a > collection of identical arrays processed at about the same time by about > the same people using about the same protocol; and by identical arrays I > mean from the same batch). While one can often do fancy things to align > different arrays prior to processing them it does not seem like a good > idea at all. When it works, so would separate normalization and when it > does not work you won't know. > > After you have normalized and estimated expression values then you > have the gene matching problem. This is not tivial, there are papers > around that discuss this (Parmigiani et al). There are some issues > regarding whether you want to make inference at the gene level or the > sequence level (Unigene is not the same as Entrez Gene). While many have > ignored the issues that arise (even on a single chip) where the same > gene has been probed via several different methods, that does not seem > to be a "best practices". > > If you have no common genes, then life is somewhat easier, you just > have a bunch more features, and the suggestion to simply use rbind seems > pretty sensible to me, although there are some potential pitfalls and > you might want to do some checking to ensure that one set of features is > not dominating the other for reasons that are not biological. > > If you do have genes in common, then life is harder, the models are > more complicated and IMHO you want to spend a few hours with a local > statistician sorting out what questions you want to ask. Essentially, > considering what the right model is, on a per gene basis is a pretty > good starting point. As I said there are some papers (Choi et al, > Gentleman et al), sometimes they come under the heading of > meta-analysis, and other times simply random effects models. For the > more statistically inclined I recommend the book by Solomon and Cox > which directly addresses issues regarding combining microarray experiments. > > Best wishes, > Robert > > G. Parmigiani, E. Garrett-Mayer, R. Anbazhagan, et al. A cross-study > comparison of gene > expression studies for the molecular classification of lung cancer. > Clincal Cancer Research, > 10:2922?2927, 2004. > R package: MergeMaid > > J. K. Choi, U. Yu, S. Kim, et al. Combining multiple microarray studies > and modeling > interstudy variation. Bioinformatics, 19, Suppl. 1:i84?i90, 2003. > R package: GeneMeta > > D.R. Cox and P. J. Solomon. Components of Variance. Chapman and Hall, > New York, 2003. > > > On the Synthesis of Microarray Experiments > R. Gentleman, M. Ruschhaupt and W. Huber, > R package: GeneMetaEx > scholz at Ag.arizona.edu wrote: > > Adrien, > > > > Thanks for this response. Unfortunately, there are no oligos in common between > > the two arrays. If anyone else has a response to my question (below), I'd like > > to hear it. > > > > Matt > > > > > > Matt, > > > > I am not familiar with the maize arrays, but I am using the following > > procedure for Affymetrix moe430 split arrays, which have ~160 probesets > > in common between A and B: > > 1) background-correct each chip separately at probe-level > > 2) get a measure of expression at probeset-level > > 3) plot the common probesets against each other for each pair of each > > chips. If you observe the same thing as me, you will see that the trend > > is linear but with intercept != 0 and slope != 1. > > 4) scale the B chip with those estimated intercept and slope > > > > Steps 1 and 2 are easily done with rma( , normalize=F). > > Wolfgang Huber and I are currently writing a little package which does > > steps 3 and 4 automatically. > > > > I'm not sure whether this procedure could make sense or be adapted > > somehow to your maize arrays (do they have enough probes in common?), > > but anyway, some food for thoughts... > > > > Adrien > > > > > >>Gordon, > >> > >>Recently you advised someone with a split set of maize arrays > >>that they could do their analysis by reading all the A slides > >>into an RGList and normalizing, then doing the same with the > >>B slides, and then combining the two datasets via > >>rbind() of the two MAList objects. I have a similar (the > >>same?) set of arrays and some of the users of these arrays > >>have noted that the A and B slides perform differently, i.e. > >>more background on the B slide, for whatever reason. Though > >>I'm not actually convinced this is true, it makes me wonder > >>whether the two datasets should be combined at all since > >>there may be a "between array set" > >>source of variation. Am I right to segregate these sets or is > >>there some overwhelming benefit to combining them? I'm no > >>statistician and would appreciate your take. > >> > >>Thanks, > >> > >> > > > > Matt > > > > --------------------------------------------- > > College of Agriculture and Life Sciences Web Mail. > > http://ag.arizona.edu > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > -- > Robert Gentleman, PhD > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > PO Box 19024 > Seattle, Washington 98109-1024 > 206-667-7700 > rgentlem at fhcrc.org > --------------------------------------------- College of Agriculture and Life Sciences Web Mail. http://ag.arizona.edu

ADD COMMENT • link 19.2 years ago scholz@Ag.arizona.edu ▴ 130

0

Entering edit mode

Hi, If they have no probes in common, and were applied to the same RNA (essentially technical and not biological replicates) then the two arrays can be combined into essentially one big matrix. I would do some careful study to make sure that there were not major differences between the two (for example look at the distribution of expression, variance within gene across samples, etc). My approach is generally to ask what things should be the same, and then to compare them. If there are big differences then you need to figure out how to address them, but if not then you can just treat it as if you measured all the features on the mRNA samples, which type of array was used is irrelevant. I'm not sure I am following the separate linear modeles part. Most of what anyone does is gene-at-a-time (you could look at the Category package for an alternative), and so you would fit separate linear models to genes within arrays and the same between arrays. When you have duplicate probes from what are essentially different experiments, then I believe you need to think about a random effects model. Best wishes, Robert scholz at Ag.arizona.edu wrote: > Thanks, Robert. If I am understanding you correctly, you would advocate both > separate normalization AND separate linear modeling in the case where the two > arrays come from different batches and have no common probeset, correct? If I > was reading Gordon's reply to the other gentleman's email correctly, he was > suggesting separate normalization but not separate linear modeling for the > datasets. My question, which in retrospect was unclear, was about what the > advantages/disadvantages were to combining/separating the datasets for linear > modeling. > > Matt > > > >>Hi, >> I am not sure what you are really asking but here goes. >>References and corresponding R/Bioconductor packages are listed below. >> >> In my opinion separate normalization and expression estimation is >>essential for different experiments (and by experiment I mean a >>collection of identical arrays processed at about the same time by about >>the same people using about the same protocol; and by identical arrays I >>mean from the same batch). While one can often do fancy things to align >>different arrays prior to processing them it does not seem like a good >>idea at all. When it works, so would separate normalization and when it >>does not work you won't know. >> >> After you have normalized and estimated expression values then you >>have the gene matching problem. This is not tivial, there are papers >>around that discuss this (Parmigiani et al). There are some issues >>regarding whether you want to make inference at the gene level or the >>sequence level (Unigene is not the same as Entrez Gene). While many have >>ignored the issues that arise (even on a single chip) where the same >>gene has been probed via several different methods, that does not seem >>to be a "best practices". >> >> If you have no common genes, then life is somewhat easier, you just >>have a bunch more features, and the suggestion to simply use rbind seems >>pretty sensible to me, although there are some potential pitfalls and >>you might want to do some checking to ensure that one set of features is >>not dominating the other for reasons that are not biological. >> >> If you do have genes in common, then life is harder, the models are >>more complicated and IMHO you want to spend a few hours with a local >>statistician sorting out what questions you want to ask. Essentially, >>considering what the right model is, on a per gene basis is a pretty >>good starting point. As I said there are some papers (Choi et al, >>Gentleman et al), sometimes they come under the heading of >>meta-analysis, and other times simply random effects models. For the >>more statistically inclined I recommend the book by Solomon and Cox >>which directly addresses issues regarding combining microarray experiments. >> >> Best wishes, >> Robert >> >>G. Parmigiani, E. Garrett-Mayer, R. Anbazhagan, et al. A cross-study >>comparison of gene >>expression studies for the molecular classification of lung cancer. >>Clincal Cancer Research, >>10:2922?2927, 2004. >>R package: MergeMaid >> >>J. K. Choi, U. Yu, S. Kim, et al. Combining multiple microarray studies >>and modeling >>interstudy variation. Bioinformatics, 19, Suppl. 1:i84?i90, 2003. >>R package: GeneMeta >> >>D.R. Cox and P. J. Solomon. Components of Variance. Chapman and Hall, >>New York, 2003. >> >> >>On the Synthesis of Microarray Experiments >>R. Gentleman, M. Ruschhaupt and W. Huber, >>R package: GeneMetaEx >>scholz at Ag.arizona.edu wrote: >> >>>Adrien, >>> >>>Thanks for this response. Unfortunately, there are no oligos in common between >>>the two arrays. If anyone else has a response to my question (below), I'd like >>>to hear it. >>> >>>Matt >>> >>> >>>Matt, >>> >>>I am not familiar with the maize arrays, but I am using the following >>>procedure for Affymetrix moe430 split arrays, which have ~160 probesets >>>in common between A and B: >>>1) background-correct each chip separately at probe-level >>>2) get a measure of expression at probeset-level >>>3) plot the common probesets against each other for each pair of each >>>chips. If you observe the same thing as me, you will see that the trend >>>is linear but with intercept != 0 and slope != 1. >>>4) scale the B chip with those estimated intercept and slope >>> >>>Steps 1 and 2 are easily done with rma( , normalize=F). >>>Wolfgang Huber and I are currently writing a little package which does >>>steps 3 and 4 automatically. >>> >>>I'm not sure whether this procedure could make sense or be adapted >>>somehow to your maize arrays (do they have enough probes in common?), >>>but anyway, some food for thoughts... >>> >>>Adrien >>> >>> >>> >>>>Gordon, >>>> >>>>Recently you advised someone with a split set of maize arrays >>>>that they could do their analysis by reading all the A slides >>>>into an RGList and normalizing, then doing the same with the >>>>B slides, and then combining the two datasets via >>>>rbind() of the two MAList objects. I have a similar (the >>>>same?) set of arrays and some of the users of these arrays >>>>have noted that the A and B slides perform differently, i.e. >>>>more background on the B slide, for whatever reason. Though >>>>I'm not actually convinced this is true, it makes me wonder >>>>whether the two datasets should be combined at all since >>>>there may be a "between array set" >>>>source of variation. Am I right to segregate these sets or is >>>>there some overwhelming benefit to combining them? I'm no >>>>statistician and would appreciate your take. >>>> >>>>Thanks, >>>> >>>> >>> >>>Matt >>> >>>--------------------------------------------- >>>College of Agriculture and Life Sciences Web Mail. >>>http://ag.arizona.edu >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >>-- >>Robert Gentleman, PhD >>Program in Computational Biology >>Division of Public Health Sciences >>Fred Hutchinson Cancer Research Center >>1100 Fairview Ave. N, M2-B876 >>PO Box 19024 >>Seattle, Washington 98109-1024 >>206-667-7700 >>rgentlem at fhcrc.org >> > > > > --------------------------------------------- > College of Agriculture and Life Sciences Web Mail. > http://ag.arizona.edu > > > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 19.2 years ago rgentleman ★ 5.5k

Login before adding your answer.