Normalized data in expresso and Expression Console differ
1
0
Entering edit mode
@oliver-stolpe-3528
Last seen 10.4 years ago
Hello list, currently I use the expresso method from the Bioconductor package to analyze Affymetrix data: normalized <- expresso(data, bgcorrect.method = "mas", normalize.method = "quantiles", pmcorrect.method = "mas", summary.method = "mas") matrix <- log2(exprs(normalized)) As a reference I use the Expression Console by Affymetrix. My goal is to rebuild the normalized data (and therefore the resulting boxplot) from the Expression Console with R. I took the log2 after normalization and correction since the Expression Console delivered relative small values (seemed logarithmized) and the expresso data had really a big range. Unfortunately the results differ. Does anyone know why they differ that noticeable (different mean, many outliers)? You may have a look at the boxplots I attached. Even when I leave out the normalization in expresso it looks nearly the same. I'm glad about any suggestions. Thanks in advance, best regards, Oliver Some helpful data: > head(matrix_expresso) data1.cel.gz data2.cel.gz data3.cel.gz data4.cel.gz 67.16587 72.66765 73.49201 74.00240 72.03782 95.80303 97.60087 64.60356 117.65746 142.88926 138.01063 159.64211 185.33413 292.81031 232.82629 259.88629 164.88572 260.95710 243.47892 247.80303 1238.80516 1674.33256 1525.44652 1490.71100 data5.cel.gz data6.cel.gz 73.5097 67.97570 93.9136 84.26307 145.7278 124.94947 250.9573 235.76545 235.0867 251.55364 1486.8813 1523.14721 > head(matrix_expresso_log2) data1.cel.gz data2.cel.gz data3.cel.gz data4.cel.gz 6.069657 6.183241 6.199515 6.209500 6.170683 6.581999 6.608822 6.013542 6.878449 7.158754 7.108636 7.318697 7.533985 8.193823 7.863110 8.021737 7.365323 8.027669 7.927653 7.953050 10.274734 10.709370 10.575016 10.541785 data5.cel.gz data6.cel.gz 6.199863 6.086947 6.553262 6.396829 7.187132 6.965201 7.971298 7.881209 7.877049 7.974722 10.538074 10.572840 > sessionInfo() R version 2.9.0 (2009-04-17) i686-redhat-linux-gnu locale: LC_CTYPE=de_DE at euro;LC_NUMERIC=C;LC_TIME=de_DE at euro;LC_COLLATE=de_DE at euro;LC_MONETARY=C;LC_MESSAGES=de_DE at euro;LC_PAPER=de_DE at euro;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE at euro;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] zebrafishcdf_2.4.0 marray_1.22.0 limma_2.18.0 RdbiPgSQL_1.18.1 [5] Rdbi_1.18.0 multtest_2.0.0 class_7.2-47 MASS_7.2-47 [9] affy_1.22.0 Biobase_2.4.1 loaded via a namespace (and not attached): [1] affyio_1.12.0 preprocessCore_1.6.0 splines_2.9.0 [4] survival_2.35-4 tools_2.9.0 -------------- next part -------------- A non-text attachment was scrubbed... Name: boxplot_expresso_log2.png Type: image/png Size: 5615 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20090623="" 55d897e4="" attachment.png=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: boxplot_expression_console_anonym.png Type: image/png Size: 15068 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20090623="" 55d897e4="" attachment-0001.png="">
Normalization GLAD Normalization GLAD • 1.3k views
ADD COMMENT
0
Entering edit mode
cstrato ★ 3.9k
@cstrato-908
Last seen 6.3 years ago
Austria
Dear Oliver Please note that Expression Console scales the mean expression levels to a pre-defined target intensity, thus you need to scale your data accordingly, or use function mas5(..., sc=500) from package affy. Furthermore, the MAS5 algorithm from Affymetrix does not use quantile normalization. Regarding the apparent outliers, to my knowledge there exist four different implementations of the MAS5 algorithm, i.e. GCOS, APT, affy and xps, which all result in slightly different expression levels, as you can e.g. see in Figure 4 of vignette APTvsXPS.pdf from package xps, see: http://www.bioconductor.org/packages/release/bioc/vignettes/xps/inst/d oc/APTvsXPS.pdf I must admit, that I do not know why the different implementations differ slightly. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ Oliver Stolpe wrote: > Hello list, > > currently I use the expresso method from the Bioconductor package to > analyze Affymetrix data: > > normalized <- expresso(data, bgcorrect.method = "mas", > normalize.method = "quantiles", > pmcorrect.method = "mas", > summary.method = "mas") > matrix <- log2(exprs(normalized)) > > As a reference I use the Expression Console by Affymetrix. My goal is > to rebuild the normalized data (and therefore the resulting boxplot) > from the Expression Console with R. I took the log2 after normalization > and correction since the Expression Console delivered relative small > values (seemed logarithmized) and the expresso data had really a big > range. Unfortunately the results differ. > > > Does anyone know why they differ that noticeable (different mean, > many outliers)? You may have a look at the boxplots I attached. > > > Even when I leave out the normalization in expresso it looks nearly > the same. > > I'm glad about any suggestions. > > Thanks in advance, > best regards, > Oliver > > Some helpful data: > > > head(matrix_expresso) > data1.cel.gz data2.cel.gz data3.cel.gz data4.cel.gz > 67.16587 72.66765 73.49201 74.00240 > 72.03782 95.80303 97.60087 64.60356 > 117.65746 142.88926 138.01063 159.64211 > 185.33413 292.81031 232.82629 259.88629 > 164.88572 260.95710 243.47892 247.80303 > 1238.80516 1674.33256 1525.44652 1490.71100 > data5.cel.gz data6.cel.gz > 73.5097 67.97570 > 93.9136 84.26307 > 145.7278 124.94947 > 250.9573 235.76545 > 235.0867 251.55364 > 1486.8813 1523.14721 > > > head(matrix_expresso_log2) > data1.cel.gz data2.cel.gz data3.cel.gz data4.cel.gz > 6.069657 6.183241 6.199515 6.209500 > 6.170683 6.581999 6.608822 6.013542 > 6.878449 7.158754 7.108636 7.318697 > 7.533985 8.193823 7.863110 8.021737 > 7.365323 8.027669 7.927653 7.953050 > 10.274734 10.709370 10.575016 10.541785 > data5.cel.gz data6.cel.gz > 6.199863 6.086947 > 6.553262 6.396829 > 7.187132 6.965201 > 7.971298 7.881209 > 7.877049 7.974722 > 10.538074 10.572840 > > > sessionInfo() > R version 2.9.0 (2009-04-17) > i686-redhat-linux-gnu > > locale: > LC_CTYPE=de_DE at euro;LC_NUMERIC=C;LC_TIME=de_DE at euro;LC_COLLATE=de_DE at euro;LC_MONETARY=C;LC_MESSAGES=de_DE at euro;LC_PAPER=de_DE at euro;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE at euro;LC_IDENTIFICATION=C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] zebrafishcdf_2.4.0 marray_1.22.0 limma_2.18.0 > RdbiPgSQL_1.18.1 > [5] Rdbi_1.18.0 multtest_2.0.0 class_7.2-47 MASS_7.2-47 > [9] affy_1.22.0 Biobase_2.4.1 > > loaded via a namespace (and not attached): > [1] affyio_1.12.0 preprocessCore_1.6.0 splines_2.9.0 > [4] survival_2.35-4 tools_2.9.0 > > > > -------------------------------------------------------------------- ---- > > > -------------------------------------------------------------------- ---- > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6