reproducing dChip expression measure
2
0
Entering edit mode
@adaikalavan-ramasamy-675
Last seen 10.2 years ago
I am trying to reproduce the dChip expression measure from the dChip software with BioConductor packages. I am aware that dChip is not open source but I would like to get as close as I can. Thus, I compare the dChip expression measure from both softwares applied on a small datasets of 12 arrays with approximately 16000 probesets. Going through mailing archive I found that I can use the following combinations of values for parameters to feed through expresso model pmcorrect.method bgcorrect.method 1 "pmonly" "none" 2 "subtractmm" "none" 3 "pmonly" "mas" 4 "subtractmm" "mas" with the following generic incantation to expresso : expresso( ReadAffy(), normalize.method="invariantset", bgcorrect.method=???, pmcorrect.method=???, summary.method="liwong" ) The correlation of the values are high and similar ( around 0.90 ). I ahve attached both the scatterplot and hexbin of expression measures from these two softwares under different models with the line of identity in red. It suggests that : a) Majority of the values are concentrated in the lower regions b) The appears to be highly correlated values at higher end but they are not perfectly identical c) the MM subtracted data gives more dis-agreement at lower range but much closer to line of identity at higher range d) mas5 background correction does not appear to make much difference Can other members of the list comment on a) if they seen similar findings b) if these results are expected and sensibility c) what else can I try to increase the reproducibility Eventually I plan on applying BioConductor's version of dChip expression measure to few other datasets, so it would be useful to use the most reproducible version from BioConductor. Thank you very much. Regards, Adai
hexbin hexbin • 1.3k views
ADD COMMENT
0
Entering edit mode
@adaikalavan-ramasamy-675
Last seen 10.2 years ago
It appears that the attachments did not come through, probably because of the size. Those interested can find the plots on the following URL http://neelix.molbiol.ox.ac.uk:8080/ramasamy/dChip/ Thank you. Regards, Adai On Thu, 2005-04-07 at 16:01 +0100, Adaikalavan Ramasamy wrote: > I am trying to reproduce the dChip expression measure from the dChip > software with BioConductor packages. I am aware that dChip is not open > source but I would like to get as close as I can. Thus, I compare the > dChip expression measure from both softwares applied on a small datasets > of 12 arrays with approximately 16000 probesets. > > Going through mailing archive I found that I can use the following > combinations of values for parameters to feed through expresso > > model pmcorrect.method bgcorrect.method > 1 "pmonly" "none" > 2 "subtractmm" "none" > 3 "pmonly" "mas" > 4 "subtractmm" "mas" > > with the following generic incantation to expresso : > > expresso( ReadAffy(), normalize.method="invariantset", > bgcorrect.method=???, pmcorrect.method=???, > summary.method="liwong" > ) > > > The correlation of the values are high and similar ( around 0.90 ). I > ahve attached both the scatterplot and hexbin of expression measures > from these two softwares under different models with the line of > identity in red. It suggests that : > > a) Majority of the values are concentrated in the lower regions > b) The appears to be highly correlated values at higher end but they are > not perfectly identical > c) the MM subtracted data gives more dis-agreement at lower range but > much closer to line of identity at higher range > d) mas5 background correction does not appear to make much difference > > > Can other members of the list comment on > a) if they seen similar findings > b) if these results are expected and sensibility > c) what else can I try to increase the reproducibility > > > Eventually I plan on applying BioConductor's version of dChip expression > measure to few other datasets, so it would be useful to use the most > reproducible version from BioConductor. > > Thank you very much. > > Regards, Adai > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.6 years ago
United States
I think you will find that any 2 reasonable Affy normalization methods have very high correlation. In the Irizarry et al paper on cross-lab and cross-platform comparisons this is called the "probe effect" and is due to the fact that the range of expression values is huge and the normalization methods do a reasonable job of preserving the ordering. However, this correlation does not translate into much overlap in the set of genes that are declared DE. A better measure of closeness of the 2 normalizations is the MA plot of the normalized values on the same array, using the 2 normalizations. Incidentally, I have never used the Li-Wong method, but I understand that it requires a fairly large data set (i.e. arrays/condition), so the differences between dChip and BioC may just be failure to converge. --Naomi At 11:01 AM 4/7/2005, Adaikalavan Ramasamy wrote: >I am trying to reproduce the dChip expression measure from the dChip >software with BioConductor packages. I am aware that dChip is not open >source but I would like to get as close as I can. Thus, I compare the >dChip expression measure from both softwares applied on a small datasets >of 12 arrays with approximately 16000 probesets. > >Going through mailing archive I found that I can use the following >combinations of values for parameters to feed through expresso > > model pmcorrect.method bgcorrect.method > 1 "pmonly" "none" > 2 "subtractmm" "none" > 3 "pmonly" "mas" > 4 "subtractmm" "mas" > >with the following generic incantation to expresso : > > expresso( ReadAffy(), normalize.method="invariantset", > bgcorrect.method=???, pmcorrect.method=???, > summary.method="liwong" > ) > > >The correlation of the values are high and similar ( around 0.90 ). I >ahve attached both the scatterplot and hexbin of expression measures >from these two softwares under different models with the line of >identity in red. It suggests that : > >a) Majority of the values are concentrated in the lower regions >b) The appears to be highly correlated values at higher end but they are >not perfectly identical >c) the MM subtracted data gives more dis-agreement at lower range but >much closer to line of identity at higher range >d) mas5 background correction does not appear to make much difference > > >Can other members of the list comment on >a) if they seen similar findings >b) if these results are expected and sensibility >c) what else can I try to increase the reproducibility > > >Eventually I plan on applying BioConductor's version of dChip expression >measure to few other datasets, so it would be useful to use the most >reproducible version from BioConductor. > >Thank you very much. > >Regards, Adai > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD COMMENT
0
Entering edit mode
Dear Naomi, thank you for the response. Please see my response. On Mon, 2005-04-11 at 19:14 -0400, Naomi Altman wrote: > I think you will find that any 2 reasonable Affy normalization methods have I am comparing the same expression measure (li-wong) but by two different softwares (dChip and BioConductor). > very high correlation. In the Irizarry et al paper on cross-lab and > cross-platform comparisons this is called the "probe effect" and is due to > the fact that the range of expression values is huge and the normalization > methods do a reasonable job of preserving the ordering. > However, this correlation does not translate into much overlap in the set > of genes that are declared DE. Very interesting paper indeed. Thank you for pointing out this. I will need to read it more on it though. > A better measure of closeness of the 2 normalizations is the MA plot of the > normalized values on the same array, using the 2 normalizations. The MA plot is simply 45 degree rotation of the scatter plots, so I prefer to look at the scatterplots directly. True, I should have done the scatterplot on an array-by-array basis but I am not too keen on looking at 48 (= 12 arrays x 4 ways ) plots. > Incidentally, I have never used the Li-Wong method, but I understand that > it requires a fairly large data set (i.e. arrays/condition), so the > differences between dChip and BioC may just be failure to converge. Very good point. I did not even consider this. I wonder how the stable expression measures is under different runs within R itself. > --Naomi > > At 11:01 AM 4/7/2005, Adaikalavan Ramasamy wrote: > >I am trying to reproduce the dChip expression measure from the dChip > >software with BioConductor packages. I am aware that dChip is not open > >source but I would like to get as close as I can. Thus, I compare the > >dChip expression measure from both softwares applied on a small datasets > >of 12 arrays with approximately 16000 probesets. > > > >Going through mailing archive I found that I can use the following > >combinations of values for parameters to feed through expresso > > > > model pmcorrect.method bgcorrect.method > > 1 "pmonly" "none" > > 2 "subtractmm" "none" > > 3 "pmonly" "mas" > > 4 "subtractmm" "mas" > > > >with the following generic incantation to expresso : > > > > expresso( ReadAffy(), normalize.method="invariantset", > > bgcorrect.method=???, pmcorrect.method=???, > > summary.method="liwong" > > ) > > > > > >The correlation of the values are high and similar ( around 0.90 ). I > >ahve attached both the scatterplot and hexbin of expression measures > >from these two softwares under different models with the line of > >identity in red. It suggests that : > > > >a) Majority of the values are concentrated in the lower regions > >b) The appears to be highly correlated values at higher end but they are > >not perfectly identical > >c) the MM subtracted data gives more dis-agreement at lower range but > >much closer to line of identity at higher range > >d) mas5 background correction does not appear to make much difference > > > > > >Can other members of the list comment on > >a) if they seen similar findings > >b) if these results are expected and sensibility > >c) what else can I try to increase the reproducibility > > > > > >Eventually I plan on applying BioConductor's version of dChip expression > >measure to few other datasets, so it would be useful to use the most > >reproducible version from BioConductor. > > > >Thank you very much. > > > >Regards, Adai > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > >
ADD REPLY
0
Entering edit mode
On Wed, Apr 13, 2005 at 02:42:37PM +0100, Adaikalavan Ramasamy wrote: > Dear Naomi, thank you for the response. Please see my response. > > > On Mon, 2005-04-11 at 19:14 -0400, Naomi Altman wrote: > > I think you will find that any 2 reasonable Affy normalization methods have > > I am comparing the same expression measure (li-wong) but by two > different softwares (dChip and BioConductor). > > > very high correlation. In the Irizarry et al paper on cross-lab and > > cross-platform comparisons this is called the "probe effect" and is due to > > the fact that the range of expression values is huge and the normalization > > methods do a reasonable job of preserving the ordering. > > However, this correlation does not translate into much overlap in the set > > of genes that are declared DE. > > Very interesting paper indeed. Thank you for pointing out this. I will > need to read it more on it though. > > > A better measure of closeness of the 2 normalizations is the MA plot of the > > normalized values on the same array, using the 2 normalizations. > > The MA plot is simply 45 degree rotation of the scatter plots, so I > prefer to look at the scatterplots directly. True, I should have done That is simply a wrong preference. While I agree that the two plots contains the same mathematical object, the same can be said if I produced a plot with extremely skewed axises. Far too many scientist (statisticians included) tends to think that if two plots contains the same numbers, they are equivalent. We (humans) generally find it much easier to gauge horizontal and vertical lines. One of the principal tasks in a MvA plot is to see if it corresponds to a line or if there is any systematic deviance from this. And when you have to make that judgement, it is much easier to do (correctly) on the basis of a MvA plot. Trye eg. to make a simple linear regression. Think of two plots 1) you plot the points and the fitted line 2) you plot the residuals While the residuals are easy to see on plot 1, plot 2 is much better for assessing them. Kasper > the scatterplot on an array-by-array basis but I am not too keen on > looking at 48 (= 12 arrays x 4 ways ) plots. > > > Incidentally, I have never used the Li-Wong method, but I understand that > > it requires a fairly large data set (i.e. arrays/condition), so the > > differences between dChip and BioC may just be failure to converge. > > Very good point. I did not even consider this. I wonder how the stable > expression measures is under different runs within R itself. > > > --Naomi > > > > At 11:01 AM 4/7/2005, Adaikalavan Ramasamy wrote: > > >I am trying to reproduce the dChip expression measure from the dChip > > >software with BioConductor packages. I am aware that dChip is not open > > >source but I would like to get as close as I can. Thus, I compare the > > >dChip expression measure from both softwares applied on a small datasets > > >of 12 arrays with approximately 16000 probesets. > > > > > >Going through mailing archive I found that I can use the following > > >combinations of values for parameters to feed through expresso > > > > > > model pmcorrect.method bgcorrect.method > > > 1 "pmonly" "none" > > > 2 "subtractmm" "none" > > > 3 "pmonly" "mas" > > > 4 "subtractmm" "mas" > > > > > >with the following generic incantation to expresso : > > > > > > expresso( ReadAffy(), normalize.method="invariantset", > > > bgcorrect.method=???, pmcorrect.method=???, > > > summary.method="liwong" > > > ) > > > > > > > > >The correlation of the values are high and similar ( around 0.90 ). I > > >ahve attached both the scatterplot and hexbin of expression measures > > >from these two softwares under different models with the line of > > >identity in red. It suggests that : > > > > > >a) Majority of the values are concentrated in the lower regions > > >b) The appears to be highly correlated values at higher end but they are > > >not perfectly identical > > >c) the MM subtracted data gives more dis-agreement at lower range but > > >much closer to line of identity at higher range > > >d) mas5 background correction does not appear to make much difference > > > > > > > > >Can other members of the list comment on > > >a) if they seen similar findings > > >b) if these results are expected and sensibility > > >c) what else can I try to increase the reproducibility > > > > > > > > >Eventually I plan on applying BioConductor's version of dChip expression > > >measure to few other datasets, so it would be useful to use the most > > >reproducible version from BioConductor. > > > > > >Thank you very much. > > > > > >Regards, Adai > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen
ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6