Problem with coefficients that are not estimable (limma)
1
0
Entering edit mode
rrm38 • 0
@rrm38-16519
Last seen 5.4 years ago

Dear list, 

I apologise in advance for being a limma amateur, but I hope you would help me with the following issue.

I am currently trying to analyse a reverse-phase protein array using limma. I am working with cell samples corresponding to 3 genotypes, and within each genotype there are several independent clones as well as technical replicates of some clones. The samples were collected over 3 days. I want to determine significant differences between the genotypes, but I realise that it is more statistically correct to tell the model that some clones are technical replicates + take into account the collection dates. I tried to do this with an appropriate design matrix. However, once I run lmFit, I get the highlighted error in the code below. I am confused as to why the two samples indicated would be problematic. I checked and there are no missing values for these particular samples in the original dataset.

Thank you in advance, 

Ralitsa Madsen

> group.clone.date

      [,1]              [,2]   [,3]   
 [1,] "WTC11_WT"        "G7"   "12dec"
 [2,] "WTC11_WT"        "B"    "12dec"
 [3,] "WTC11_WT"        "G7"   "14dec"
 [4,] "WTC11_WT"        "B"    "15dec"
 [5,] "WTC11_WT"        "B"    "15dec"
 [6,] "WTC11_WT"        "C"    "15dec"
 [7,] "WTC11_WT"        "C"    "15dec"
 [8,] "WTC11_WT"        "H11"  "15dec"
 [9,] "WTC11_WT"        "H11"  "15dec"
[10,] "WTC11_WT"        "G7"   "15dec"
[11,] "WTC11_H1047Rhet" "Het2" "12dec"
[12,] "WTC11_H1047Rhet" "Het3" "14dec"
[13,] "WTC11_H1047Rhet" "Het1" "14dec"
[14,] "WTC11_H1047Rhet" "Het3" "15dec"
[15,] "WTC11_H1047Rhet" "Het3" "15dec"
[16,] "WTC11_H1047Rhom" "G"    "12dec"
[17,] "WTC11_H1047Rhom" "G"    "14dec"
[18,] "WTC11_H1047Rhom" "F"    "14dec"
[19,] "WTC11_H1047Rhom" "F"    "15dec"
[20,] "WTC11_H1047Rhom" "F"    "15dec"
[21,] "WTC11_H1047Rhom" "F"    "15dec"
[22,] "WTC11_H1047Rhom" "G"    "15dec"

> design.WTC11 <-
+   model.matrix( ~ 0 + group.WTC11 + clone + col.date) 

> colnames(design.WTC11) <-

+   gsub("group.WTC11", "", colnames(design.WTC11))
 

> contr.matrix.WTC11 <- makeContrasts(
+   HOMvsWT = WTC11_H1047Rhom - WTC11_WT,
+   HETvsWT = WTC11_H1047Rhet - WTC11_WT,
+   HOMvsHET = WTC11_H1047Rhom - WTC11_H1047Rhet,
+   levels = colnames(design.WTC11)
+ )

> RPPA.fit <-
+   lmFit(mat_RPPA.limma.log, design.WTC11) # warning about partial NA coefficients
Coefficients not estimable: cloneG cloneHet3 
Warning message:
Partial NA coefficients for 57 probe(s) 

> sessionInfo()

R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2     stringr_1.3.1      ggthemes_3.5.0     cowplot_0.9.2      extrafont_0.17     gridExtra_2.3      lattice_0.20-35    ggplot2_2.2.1     
 [9] magrittr_1.5       dplyr_0.7.6        tidyr_0.8.1        RColorBrewer_1.1-2 gplots_3.0.1       edgeR_3.20.9       limma_3.34.9      

limma batch effect • 9.2k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

Firstly, let me point out some flaws in your post.

  1. group.WTC11, clone and col.date are not defined anywhere.
  2. The message is a warning, not an error.
  3. The warning is referring to coefficients in the design matrix, not to individual samples.
  4. You're mixing technical and biological replicates, which is always a bad idea.
  5. You should be using the latest version of R (3.5.*) and limma (3.36.*).

Now, onto your question. If you look at the design matrix, you will notice that

identical(design.WTC11[,"WTC11_H1047Rhet"],
    rowSums(design.WTC11[,c("cloneHet1", "cloneHet2", "cloneHet3")])) # TRUE

This means that the WTC11_H1047Rhet coefficient is linearly dependent with the three cloneHet* coefficients. Thus, there is no unique least-squares solution for the linear model. I could set WTC11_H1047Rhet to 1 and the cloneHet* coefficients to zero, and get the same sum of squares as if I set WTC11_H1047Rhet to zero and the cloneHet* coefficients to 1.

The same argument applies to the other coefficient that lmFit complains about:

identical(design.WTC11[,"WTC11_H1047Rhom"],
    rowSums(design.WTC11[,c("cloneF", "cloneG")])) # TRUE

The only solution is to restore full rank by removing cloneG (or F) and cloneHet3 (or 1, or 2) from the design matrix. Let's assume you do this and let's go through the remaining columns of the design matrix.

  • WTC11_H1047Rhet is the average log-expression of all Het3 samples.
  • WTC11_H1047Rhom is the average log-expression of all G samples.
  • WTC11_WT is the average log-expression of all B samples.
  • cloneC is the average log-fold change of C over B.
  • cloneF is the average log-fold change of F over G.
  • cloneG7 is the average log-fold change of G7 over B.
  • cloneH11 is the average log-fold change of H11 over B.
  • cloneHet1 is the average log-fold change of Het1 over Het3.
  • cloneHet1 is the average log-fold change of Het2 over Het3.
  • col.date14dec is the average log-fold change of samples collected on 14 Dec over those collected on 12 Dec.
  • col.date15dec is the average log-fold change of samples collected on 15 Dec over those collected on 12 Dec.

Hopefully, from this it should be clear that your contr.matrix.WTC11 is not doing what you expected. Despite their names, the first three coefficients do not represent the genotype average; they only represent the average of a specific clone in that genotype. Thus, your contrast matrix is not actually comparing groups, it is only comparing specific clones. This should be fairly obvious when you think about your design matrix, as the clone identity is nested within the genotype. Blocking on the clones will thus eliminate your ability to perform any comparisons between genotypes, because any differences between genotypes can be fully explained by the clones.

The more suitable strategy would be to use a simpler design matrix:

design2 <- model.matrix( ~ 0 + group.WTC11 + col.date) 

... and block on clone in duplicateCorrelation, which preserves some information for genotype comparisons.

ADD COMMENT
0
Entering edit mode

(~3 years after...) Hi Aaron, I arrived to your answer by looking for help with a related question of mine and I'm wondering if you have any suggestions? I'm aware that I can just re-run this by considering treatment and time as covariates without having each factor independently specified, but perhaps there is a more direct way from the design? Perhaps you have any input to my question? Thanks a lot in advance

ADD REPLY

Login before adding your answer.

Traffic: 540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6