Question about quantile normalization in normalize.AffyBatch.quantiles

0

Entering edit mode

Owen Solberg ▴ 10

@owen-solberg-4039

Last seen 10.6 years ago

Hi Bioconductor community, My understanding of the rma method is that it is composed of the following three steps: background correction, quantile normalization, and probe summarization. My question concerns the quantile normalized probe intensities that are returned by the normalize.AffyBatch.quantiles function, in step 2. According to Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile normalization algorithm is described, the vectors of sorted probe intensities across all arrays should be equal after quantile normalization. However, comparing the sorted probe intensities (in this example, for the first and second arrays) shows that they are not equal, and furthermore, plotting the differences reveals an odd pattern to the differences. The differences are quite small overall, and barely affect the higher intensity probes, but I am still curious. Can anyone explain what is going on here? (working example provided below) Thanks, Owen library(CLL) data("CLLbatch") step1 <- bg.correct(CLLbatch, "rma") step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly") ## step3 <- computeExprSet(step2, "pmonly", "medianpolish") ## I have verified that the above 3 steps are essentially equivalent to rma(CLLbatch) ## no need to run the 3rd step for the following examples a <- sort(pm(step2)[,1]) b <- sort(pm(step2)[,2]) z <- a-b ## most of the values are not identical... sum(!z==0)/length(z) [1] 0.9299108 ## ...but the differences fluctuate around zero in an oddly symmetrical manner... plot(a, z) ## ...and zooming in shows that the differences come in groups of probes. plot(z[1:10000]) ## also, plotted as a percentage of the intensity, the differences are never over 3%, and diminish at higher probe intensities plot(a, z/a)

Normalization probe Normalization probe • 1.1k views

ADD COMMENT • link updated 15.0 years ago by Ben Bolstad ★ 1.2k • written 15.0 years ago by Owen Solberg ▴ 10

0

Entering edit mode

Ben Bolstad ★ 1.2k

@ben-bolstad-1494

Last seen 7.6 years ago

Almost certainly what you are seeing is a reflection of the fact that the quantile normalization code underlying normalize.AffyBatch.quantiles is designed to handle ties. Basically, the algorithm attempts to ensure that data values that are exactly equal on input within a specific array (column) are also exactly equal on output. The algorithmic description in the paper skates over the issue of how to appropriately deal with ties. For the data below there are plenty of ties > length(unique(a)) [1] 7270 > length(unique(sort(pm(step1)[,1]))) [1] 7270 > length(a) [1] 201800 On Thu, 2010-04-22 at 18:32 -0700, Owen Solberg wrote: > Hi Bioconductor community, > > My understanding of the rma method is that it is composed of the > following three steps: background correction, quantile normalization, > and probe summarization. My question concerns the quantile normalized > probe intensities that are returned by the > normalize.AffyBatch.quantiles function, in step 2. According to > Bolstad et al (Bioinformatics. 2003 19(2):185-93), in which quantile > normalization algorithm is described, the vectors of sorted probe > intensities across all arrays should be equal after quantile > normalization. However, comparing the sorted probe intensities (in > this example, for the first and second arrays) shows that they are not > equal, and furthermore, plotting the differences reveals an odd > pattern to the differences. The differences are quite small overall, > and barely affect the higher intensity probes, but I am still curious. > Can anyone explain what is going on here? (working example provided > below) > > Thanks, > Owen > > > library(CLL) > data("CLLbatch") > step1 <- bg.correct(CLLbatch, "rma") > step2 <- normalize.AffyBatch.quantiles(step1, type="pmonly") > ## step3 <- computeExprSet(step2, "pmonly", "medianpolish") > ## I have verified that the above 3 steps are essentially equivalent > to rma(CLLbatch) > ## no need to run the 3rd step for the following examples > > a <- sort(pm(step2)[,1]) > b <- sort(pm(step2)[,2]) > z <- a-b > > ## most of the values are not identical... > sum(!z==0)/length(z) > [1] 0.9299108 > > ## ...but the differences fluctuate around zero in an oddly > symmetrical manner... > plot(a, z) > > ## ...and zooming in shows that the differences come in groups of probes. > plot(z[1:10000]) > > ## also, plotted as a percentage of the intensity, the differences are > never over 3%, and diminish at higher probe intensities > plot(a, z/a) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.0 years ago Ben Bolstad ★ 1.2k

Login before adding your answer.