quantile normalization

0

Entering edit mode

Martino Barenco ▴ 110

@martino-barenco-278

Last seen 10.7 years ago

Hi, My understanding of quantile normalization is that values for several data sets are ranked, then the average per rank is taken and is reattributed to each data set according to the original rank (hope this makes sense). My question is: how does one deal when ties occur? - One possibility would be to force a tied value to be greater than the other (using the "order" command instead of "rank"). Even though it would not make a huge difference it is a bit arbitrary. - Another possibility is to "force" the ties across all data sets, quite arbitrary as well. - or maybe a combination of both, but it looks like a nightmare to program! So, what should be done? Also, is there a method such as "normalize" that acts on exprSet rather than AffyBatch? Thanks Martino --------------------------------------- Martino Barenco CoMPLEX 4, Stephenson Way London NW1 2HE Tel.: +44 20 7679 5088 Fax.: +44 20 7383 5519 Email: m.barenco@ucl.ac.uk

Normalization Normalization • 3.0k views

ADD COMMENT • link updated 21.9 years ago by Laurent Gautier ★ 2.3k • written 21.9 years ago by Martino Barenco ▴ 110

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 24 minutes ago

WEHI, Melbourne, Australia

A good point! Although probably not of great practical importance for most microarray experiments. I think the correct in-principle behavior of quantile normalization is clear. In the case of ties, one should in principle set each tied log-intensity to the average of the corresponding pooled quantiles. There are two quantile normalization routines in bioconductor at the moment (one intended for cDNA arrays and one for affy) and neither does exactly the right thing. limma breaks ties using by preserving index order while affy treats all tied values as having the minimum of the tied ranks: > A [,1] [,2] [1,] 1 2 [2,] 2 4 [3,] 3 4 [4,] 4 6 > library(limma) > normalizeQuantiles(A) [,1] [,2] [1,] 1.5 1.5 [2,] 3.0 3.0 [3,] 3.5 3.5 [4,] 5.0 5.0 > library(affy) > normalize.quantiles(A) [,1] [,2] [1,] 1.5 1.5 [2,] 3.0 3.0 [3,] 3.5 3.0 [4,] 5.0 5.0 In principle, I think the correct quantile normalized matrix should be [,1] [,2] [1,] 1.5 1.5 [2,] 3.0 3.25 [3,] 3.5 3.25 [4,] 5.0 5.0 Cheers Gordon At 02:54 AM 4/06/2003, Martino Barenco wrote: >Hi, > >My understanding of quantile normalization is that values for several data >sets are ranked, then the average per rank is taken and is reattributed to >each data set according to the original rank (hope this makes sense). My >question is: how does one deal when ties occur? >- One possibility would be to force a tied value to be greater than the >other (using the "order" command instead of "rank"). Even though it would >not make a huge difference it is a bit arbitrary. >- Another possibility is to "force" the ties across all data sets, quite >arbitrary as well. >- or maybe a combination of both, but it looks like a nightmare to program! > >So, what should be done? Also, is there a method such as "normalize" that >acts on exprSet rather than AffyBatch? > >Thanks > >Martino >--------------------------------------- >Martino Barenco >CoMPLEX >4, Stephenson Way >London NW1 2HE >Tel.: +44 20 7679 5088 >Fax.: +44 20 7383 5519 >Email: m.barenco@ucl.ac.uk

ADD COMMENT • link 21.9 years ago Gordon Smyth 52k

0

Entering edit mode

Ties are not generally much of a problem. But below demonstrates how it is handled in the affy normalize.quantiles() routine > while affy > treats all tied values as having the minimum of the tied ranks: When you have an even number of ties this is true (ranks are rounded down so with 2 ties the above would happen). With an odd number of ties a whole number rank is used eg > # odd number of ties > X <- matrix(c(1:8,1,2,3,4,5,5,5,8),8,2) > X [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 4 [5,] 5 5 [6,] 6 5 [7,] 7 5 [8,] 8 8 > normalize.quantiles(X) [,1] [,2] [1,] 1.0 1.0 [2,] 2.0 2.0 [3,] 3.0 3.0 [4,] 4.0 4.0 [5,] 5.0 5.5 [6,] 5.5 5.5 [7,] 6.0 5.5 [8,] 8.0 8.0 > > rank(c(1,2,3,4,5,5,5,8)) [1] 1 2 3 4 6 6 6 8 ># so rank of 6 used for ties ># even number of ties > X <- matrix(c(1:8,1,2,3,4,5,5,7,8),8,2) > X [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 4 [5,] 5 5 [6,] 6 5 [7,] 7 7 [8,] 8 8 > normalize.quantiles(X) [,1] [,2] [1,] 1.0 1 [2,] 2.0 2 [3,] 3.0 3 [4,] 4.0 4 [5,] 5.0 5 [6,] 5.5 5 [7,] 7.0 7 [8,] 8.0 8 > rank(c(1,2,3,4,5,5,7,8)) [1] 1.0 2.0 3.0 4.0 5.5 5.5 7.0 8.0 > # so rank rounded down to 5 Thanks, Ben

ADD REPLY • link 21.9 years ago Ben Bolstad ★ 1.1k

0

Entering edit mode

I had a play around with 'normalizeQuantiles' to see how easy it is to give the "correct" treatment of ties. It's not too difficult, but there is a performance penalty, so I've made it a non-default option. > A <- cbind(c(1,2,3,4,5),c(2,4,4,6,7)) > normalizeQuantiles(A,ties=TRUE) [,1] [,2] [1,] 1.5 1.50 [2,] 3.0 3.25 [3,] 3.5 3.25 [4,] 5.0 5.00 [5,] 6.0 6.00 > A <- cbind(c(1,2,3,4,5),c(2,4,4,4,7)) > normalizeQuantiles(A,ties=TRUE) [,1] [,2] [1,] 1.5 1.5 [2,] 3.0 3.5 [3,] 3.5 3.5 [4,] 4.0 3.5 [5,] 6.0 6.0 The 'normalizeQuantiles' function also handles missing values, assumed 'missing at random': > A [,1] [,2] [1,] 1 2 [2,] NA 4 [3,] 3 4 [4,] 4 6 [5,] 5 7 > normalizeQuantiles(A,ties=TRUE) [,1] [,2] [1,] 1.500000 1.500 [2,] NA 3.500 [3,] 3.416667 3.500 [4,] 4.666667 5.125 [5,] 6.000000 6.000 Here are some timings. Handling the ties carefully increases the time taken several fold: > A <- matrix(rnorm(30000*100),30000,100) # 100 arrays with 30,000 spots each > system.time(B <- normalizeQuantiles(A)) [1] 3.62 0.00 3.98 NA NA > system.time(B <- normalizeQuantiles(A,ties=TRUE)) [1] 12.25 0.01 12.39 NA NA > system.time(B <- normalize.quantiles(A)) [1] 6.42 0.07 6.50 NA NA As pointed out by Ben, normalize.quantiles already gives the "correct" treatment of ties when the number of ties is odd. Cheers Gordon At 11:59 AM 4/06/2003, Gordon Smyth wrote: >A good point! Although probably not of great practical importance for most >microarray experiments. > >I think the correct in-principle behavior of quantile normalization is >clear. In the case of ties, one should in principle set each tied >log-intensity to the average of the corresponding pooled quantiles. There >are two quantile normalization routines in bioconductor at the moment (one >intended for cDNA arrays and one for affy) and neither does exactly the >right thing. limma breaks ties using by preserving index order while affy >treats all tied values as having the minimum of the tied ranks: > > > A > [,1] [,2] >[1,] 1 2 >[2,] 2 4 >[3,] 3 4 >[4,] 4 6 > > library(limma) > > normalizeQuantiles(A) > [,1] [,2] >[1,] 1.5 1.5 >[2,] 3.0 3.0 >[3,] 3.5 3.5 >[4,] 5.0 5.0 > > library(affy) > > normalize.quantiles(A) > [,1] [,2] >[1,] 1.5 1.5 >[2,] 3.0 3.0 >[3,] 3.5 3.0 >[4,] 5.0 5.0 > >In principle, I think the correct quantile normalized matrix should be > [,1] [,2] >[1,] 1.5 1.5 >[2,] 3.0 3.25 >[3,] 3.5 3.25 >[4,] 5.0 5.0 > >Cheers >Gordon > >At 02:54 AM 4/06/2003, Martino Barenco wrote: >>Hi, >> >>My understanding of quantile normalization is that values for several >>data sets are ranked, then the average per rank is taken and is >>reattributed to each data set according to the original rank (hope this >>makes sense). My question is: how does one deal when ties occur? >>- One possibility would be to force a tied value to be greater than the >>other (using the "order" command instead of "rank"). Even though it would >>not make a huge difference it is a bit arbitrary. >>- Another possibility is to "force" the ties across all data sets, quite >>arbitrary as well. >>- or maybe a combination of both, but it looks like a nightmare to program! >> >>So, what should be done? Also, is there a method such as "normalize" that >>acts on exprSet rather than AffyBatch? >> >>Thanks >> >>Martino >>--------------------------------------- >>Martino Barenco >>CoMPLEX >>4, Stephenson Way >>London NW1 2HE >>Tel.: +44 20 7679 5088 >>Fax.: +44 20 7383 5519 >>Email: m.barenco@ucl.ac.uk

ADD REPLY • link 21.9 years ago Gordon Smyth 52k

0

Entering edit mode

Laurent Gautier ★ 2.3k

@laurent-gautier-29

Last seen 10.7 years ago

On Tue, Jun 03, 2003 at 05:54:47PM +0100, Martino Barenco wrote: > Hi, > > My understanding of quantile normalization is that values for several > data sets are ranked, then the average per rank is taken and is > reattributed to each data set according to the original rank (hope this > makes sense). My question is: how does one deal when ties occur? > - One possibility would be to force a tied value to be greater than the > other (using the "order" command instead of "rank"). Even though it > would not make a huge difference it is a bit arbitrary. > - Another possibility is to "force" the ties across all data sets, > quite arbitrary as well. > - or maybe a combination of both, but it looks like a nightmare to > program! I am not too sure either about the practical importance for real life data (althought ties should be around for Affymetrix chips -- chips like U95 have roughly 400 kiloprobes and the intensity range is often [0.0, 30000.0] with a large number of low intensities). (note: the package affy has more than one 'quantiles-based' normalization method; by alphabetic order: "qspline", "quantiles", "quantiles.robust") > > So, what should be done? Also, is there a method such as "normalize" > that acts on exprSet rather than AffyBatch? Not yet (but long running thoughts about unifying normalization methods had a ray of hope, very recently)... In the meanwhile, you can hack your own in a wink (look at the vignette 'affy: custom processing methods'). > > Thanks > > Martino > --------------------------------------- > Martino Barenco > CoMPLEX > 4, Stephenson Way > London NW1 2HE > Tel.: +44 20 7679 5088 > Fax.: +44 20 7383 5519 > Email: m.barenco@ucl.ac.uk > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- -------------------------------------------------------------- currently at the National Yang-Ming University in Taipei, Taiwan -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent

ADD COMMENT • link 21.9 years ago Laurent Gautier ★ 2.3k

Login before adding your answer.