Question

Bugs in HTqPCR implementation of data normalization

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 6 months ago

United States

Hello, I was hoping to use HTqPCR to analyze my data, but found some bugs that make the package hard to use (or giving wrong results...). Looking at the source code of normalizeCtData, it appears that there is a bug in the geometric.mean and scale.rankinvariant methods - that data seem to be multiplied by scaling factors instead of being divided by them. This magnifies inter-sample differences instead of reducing them! Here's an example script that illustrates the problem. I generate 4 samples with a small number of measurements, with all measurements the same in each sample. library(HTqPCR) n = 10; m = 4; htData = new( "qPCRset", exprs = t(matrix(20 + 2*(1:m), m, n))); featureNames(htData) = paste0("mir.", 1:n); sampleNames(htData) = paste0("sample", 1:m); featureType(htData) = rep("Target", n); exprs(htData) # sample1 sample2 sample3 sample4 #mir.1 22 24 26 28 #mir.2 22 24 26 28 #mir.3 22 24 26 28 #mir.4 22 24 26 28 #mir.5 22 24 26 28 #mir.6 22 24 26 28 #mir.7 22 24 26 28 #mir.8 22 24 26 28 #mir.9 22 24 26 28 #mir.10 22 24 26 28 htNorm.gm = normalizeCtData(htData, norm = "geometric.mean") exprshtNorm.gm); #Scaling Ct values # Using geometric mean within each sample # Scaling factors: 1.00 1.09 1.18 1.27 > exprshtNorm.gm); # sample1 sample2 sample3 sample4 #mir.1 22 26.18182 30.72727 35.63636 #mir.2 22 26.18182 30.72727 35.63636 #mir.3 22 26.18182 30.72727 35.63636 #mir.4 22 26.18182 30.72727 35.63636 #mir.5 22 26.18182 30.72727 35.63636 #mir.6 22 26.18182 30.72727 35.63636 #mir.7 22 26.18182 30.72727 35.63636 #mir.8 22 26.18182 30.72727 35.63636 #mir.9 22 26.18182 30.72727 35.63636 #mir.10 22 26.18182 30.72727 35.63636 Note that the sample differences were maginfied, not reduced. Same thing happens for the scale.rankinvariant method: n = 100; m = 4; htData.2 = new( "qPCRset", exprs = t(matrix(20 + 2*(1:m), m, n))); set.seed(1); exprs(htData.2) = exprs(htData.2) + runif(m) htNorm.sri = normalizeCtData(htData.2, norm = "scale.rankinvariant") head(exprs(htNorm.sri)) Furthermore, the geometric.mean normalization has been implemented differently from the article referenced in connection with it (Mestdagh et al., 2009). While the article is written so it lacks exact details and formulas, I have confirmed with the authors that "geometric mean" applies to the linearized expression data, that is, to exponentiated Ct-values; this is equivalent to averaging Ct using arithmetic mean, not geometric mean as implemented in normalizeCtData. Further, the article seems to suggest dividing the linearized values by the geometric mean, which means subtracting, not dividing, the Ct values by the arithmetic mean. This distinction should be pointed out in the vignette, or the method should be re-implemented according to the article. I also keep getting errors when trying to print objects. In the above example, typing htData produces the following output: An object of class "qPCRset" Size: 10 features, 4 samples Feature types: Feature names: mir.1 mir.2 mir.3 ... Feature classes: Error in `rownames<-`(`*tmp*`, value = c("mir.1", "mir.2", "mir.3", "mir.4", : attempt to set 'rownames' on an object with no dimensions Best, Peter

Normalization HTqPCR Normalization HTqPCR • 1.2k views

ADD COMMENT • link 10.8 years ago Peter Langfelder ★ 3.0k