Question about quantile normalization in normalize.AffyBatch.quantiles
1
0
Entering edit mode
Owen Solberg ▴ 10
@owen-solberg-4039
Last seen 10.4 years ago
Hi Bioconductor community, My understanding of the rma method is that it is composed of the following three steps: background correction, quantile normalization, and probe summarization. My question concerns the quantile normalized probe intensities that are returned by the normalize.AffyBatch.quantiles function, in step 2. According to Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile normalization algorithm is described, the vectors of sorted probe intensities across all arrays should be equal after quantile normalization. However, comparing the sorted probe intensities (in this example, for the first and second arrays) shows that they are not equal, and furthermore, plotting the differences reveals an odd pattern to the differences. The differences are quite small overall, and barely affect the higher intensity probes, but I am still curious. Can anyone explain what is going on here? (working example provided below) Thanks, Owen library(CLL) data("CLLbatch") step1 <- bg.correct(CLLbatch, "rma") step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly") ## step3 <- computeExprSet(step2, "pmonly", "medianpolish") ## I have verified that the above 3 steps are essentially equivalent to rma(CLLbatch) ## no need to run the 3rd step for the following examples a <- sort(pm(step2)[,1]) b <- sort(pm(step2)[,2]) z <- a-b ## most of the values are not identical... sum(!z==0)/length(z) [1] 0.9299108 ## ...but the differences fluctuate around zero in an oddly symmetrical manner... plot(a, z) ## ...and zooming in shows that the differences come in groups of probes. plot(z[1:10000]) ## also, plotted as a percentage of the intensity, the differences are never over 3%, and diminish at higher probe intensities plot(a, z/a)
Normalization probe Normalization probe • 1.1k views
ADD COMMENT
0
Entering edit mode
Ben Bolstad ★ 1.2k
@ben-bolstad-1494
Last seen 7.4 years ago
Almost certainly what you are seeing is a reflection of the fact that the quantile normalization code underlying normalize.AffyBatch.quantiles is designed to handle ties. Basically, the algorithm attempts to ensure that data values that are exactly equal on input within a specific array (column) are also exactly equal on output. The algorithmic description in the paper skates over the issue of how to appropriately deal with ties. For the data below there are plenty of ties > length(unique(a)) [1] 7270 > length(unique(sort(pm(step1)[,1]))) [1] 7270 > length(a) [1] 201800 On Thu, 2010-04-22 at 18:32 -0700, Owen Solberg wrote: > Hi Bioconductor community, > > My understanding of the rma method is that it is composed of the > following three steps: background correction, quantile normalization, > and probe summarization. My question concerns the quantile normalized > probe intensities that are returned by the > normalize.AffyBatch.quantiles function, in step 2. According to > Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile > normalization algorithm is described, the vectors of sorted probe > intensities across all arrays should be equal after quantile > normalization. However, comparing the sorted probe intensities (in > this example, for the first and second arrays) shows that they are not > equal, and furthermore, plotting the differences reveals an odd > pattern to the differences. The differences are quite small overall, > and barely affect the higher intensity probes, but I am still curious. > Can anyone explain what is going on here? (working example provided > below) > > Thanks, > Owen > > > library(CLL) > data("CLLbatch") > step1 <- bg.correct(CLLbatch, "rma") > step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly") > ## step3 <- computeExprSet(step2, "pmonly", "medianpolish") > ## I have verified that the above 3 steps are essentially equivalent > to rma(CLLbatch) > ## no need to run the 3rd step for the following examples > > a <- sort(pm(step2)[,1]) > b <- sort(pm(step2)[,2]) > z <- a-b > > ## most of the values are not identical... > sum(!z==0)/length(z) > [1] 0.9299108 > > ## ...but the differences fluctuate around zero in an oddly > symmetrical manner... > plot(a, z) > > ## ...and zooming in shows that the differences come in groups of probes. > plot(z[1:10000]) > > ## also, plotted as a percentage of the intensity, the differences are > never over 3%, and diminish at higher probe intensities > plot(a, z/a) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6