Limma: NA handling in contrasts.fit
0
0
Entering edit mode
Simon Anders ▴ 150
@simon-anders-2626
Last seen 10.4 years ago
Dear Gordon et al. I've just noticed an oddity in the way how limma's contrasts.fit function handles missing observations, and I wonder if it might be a bug. Please have a look at the following test case: I use the this design matrix: > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > dm intercept a b [1,] 1 0 0 [2,] 1 0 1 [3,] 1 1 0 [4,] 1 1 1 Let's construct sample data for 5 genes and put in two NAs as follows: > y <- t( replicate( 5, dm %*% runif(3) ) ) > y[ 1, 3:4 ] <- NA > y [,1] [,2] [,3] [,4] [1,] 0.099221925 0.5628846 NA NA [2,] 0.009771325 0.7748060 0.3977409 1.162776 [3,] 0.223688182 0.6330630 0.6791238 1.088499 [4,] 0.957762805 1.4338553 1.4259875 1.902080 [5,] 0.766597103 1.3905022 0.9947635 1.618669 If I fit this data with lmFit, I unsurprisingly get a warning that some coefficients cannot be estimated: > library( limma ) > fit <- lmFit( y, dm ) Warning message: Partial NA coefficients for 1 probe(s) The missing coefficient is the coefficient 'a' for the first gene. Note that 'b' can be estimated: > coefficients( fit ) intercept a b [1,] 0.099221925 NA 0.4636627 [2,] 0.009771325 0.3879696 0.7650347 [3,] 0.223688182 0.4554356 0.4093748 [4,] 0.957762805 0.4682247 0.4760925 [5,] 0.766597103 0.2281664 0.6239051 I now ask 'contrast.fit' to boil the fit object down to only contain the "b" coefficients. I should get the same coefficients, but only the "b" column. > fit2 <- contrasts.fit( fit, coefficients="b" ) > coefficients(fit2) b [1,] NA [2,] 0.7650347 [3,] 0.4093748 [4,] 0.4760925 [5,] 0.6239051 Indeed, I get the same values, but the first value is now masked as 'NA'. Is there a reason for this, or is this a bug? Granted, in this example, the use of 'make.contrasts' is superfluous, but in the following example, it is not. I place the NA, such that 'a' cannot be estimated, and I get an NA in the contrast 'b-c': > dm <- cbind( intercept=1, a=rep(c(0,1),each=2), b=rep(c(0,1), 2) ) > dm <- rbind( cbind( dm, c=0 ), cbind( dm, c=1 ) ) > dm intercept a b c [1,] 1 0 0 0 [2,] 1 0 1 0 [3,] 1 1 0 0 [4,] 1 1 1 0 [5,] 1 0 0 1 [6,] 1 0 1 1 [7,] 1 1 0 1 [8,] 1 1 1 1 > y <- t( replicate( 5, dm %*% runif(4) ) ) > y[ 1, c(3,4,7,8) ] <- NA > fit <- lmFit( y, dm ) Warning message: Partial NA coefficients for 1 probe(s) > coefficients( fit ) intercept a b c [1,] 0.17989906 NA 0.66812435 0.54889675 [2,] 0.26932489 0.9461271 0.77956666 0.05345084 [3,] 0.38124957 0.0309537 0.58205980 0.26414381 [4,] 0.50592287 0.9680045 0.86566150 0.96073095 [5,] 0.01829934 0.1103156 0.09401078 0.02140402 > fit2 <- contrasts.fit( fit, c( 0, 0, 1, -1 ) ) > coefficients(fit2) [,1] [1,] NA [2,] 0.72611582 [3,] 0.31791599 [4,] -0.09506945 [5,] 0.07260676 Thanks in advance for any help in getting around this, as I have a lot of data with quite some missing values. Best regards Simon Anders > sessionInfo() R version 2.9.0 alpha (2009-03-24 r48212) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_2.17.12 +--- | Dr. Simon Anders, Dipl. Phys. | European Bioinformatics Institute, Hinxton, Cambridgeshire, UK | office phone +44-1223-492680, mobile phone +44-7505-841692 | preferred (permanent) e-mail: sanders at fs.tum.de
limma limma • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6