Question

Can I ignore the warnings in limma show " Coefficients not estimable" after batch correct ?

1

Entering edit mode

xingxd16 ▴ 20

@xingxd16-20156

Last seen 6.1 years ago

Hi all :

When I use limma to identify differentrial expression genes , I want to correct batch effect also.
My command and data information is :

> table(cluster)
cluster
control   treat 
    368     957 
> table(Exp)
Exp
  p3   p4 LC05 LC09 LC10 LC11 LC12 
 370  587   21  210   19   69   49 
> table(cluster,Exp)
         Exp
cluster    p3  p4 LC05 LC09 LC10 LC11 LC12
  control   0   0   21  210   19   69   49
  treat   370 587    0    0    0    0    0
> design <- model.matrix(~ cluster+Exp)
> fit <- lmFit(selectExpressionSetObject, design)
Coefficients not estimable: ExpLC12 
Warning message:
Partial NA coefficients for 6399 probe(s) 
> fit <- eBayes(fit, trend=TRUE)
Warning messages:
1: In out$var.prior[is.na(out$var.prior)] <- 1/out$s2.prior :
  被替换的项目不是替换值长度的倍数(*Chinese here should translate to The replaced item is not a multiple of the length of the replacement value*)
2: In .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim,  :
  Estimation of var.prior failed - set to default value
> res <- decideTests(fit, p.value=adjPvalueCutoff, lfc = logFCcutoff)
> summary(res)
       (Intercept) clustertreat Expp4 ExpLC05 ExpLC09 ExpLC10 ExpLC11
Down             0          106    63       0      21       4       9
NotSig        5239         6259  6151    6387    6344    6374    6356
Up            1160           34   185      12      34      21      34
       ExpLC12
Down         0
NotSig       0
Up           0
> 
> DEgeneSets <- topTable(fit, coef="clustertreat", number=Inf,
+ p.value=adjPvalueCutoff, adjust="BH", lfc = logFCcutoff)
> head(DEgeneSets)
        gene_short_name Database     logFC   AveExpr         t      P.Value
SFTPC             SFTPC  emsembl -1.740316 0.4582606 -19.04131 4.347876e-72
SCGB3A1         SCGB3A1  emsembl -1.336138 0.5532937 -17.45647 6.572995e-62
H3F3A             H3F3A  emsembl -1.706766 0.6291855 -16.61615 9.704449e-57
HSPA1A           HSPA1A  emsembl  2.857433 1.6038324  14.99185 3.061836e-47
ZFP36L2         ZFP36L2  emsembl -2.080686 0.7430561 -14.98351 3.411612e-47
HSPA1B           HSPA1B  emsembl  1.978068 0.9748275  12.42446 1.060178e-33
           adj.P.Val         B
SFTPC   2.782206e-68 151.97484
SCGB3A1 2.103030e-58 129.01848
H3F3A   2.069959e-53 117.36082
HSPA1A  4.366181e-44  95.94128
ZFP36L2 4.366181e-44  95.83537
HSPA1B  1.130679e-30  65.43102

I know the reason may that the control samples are in 2 batch , the treat samples are in another 5 batch. So my matrix's columns are dependent and this causes the technical problem. I can not know that the different between control and treat are the biology difference or batch drived. Am I right ?
Even there are warnings, I found that I can still get the results by " topTable(fit, coef="clustertreat") " see above . So my question is can I ignore the warnings and still use the results ? Or in this situation correct batch is wrong ? And I indeed found that some genes are logFC >0 without batch correct, but it turn to logFC <0 after batch correct .

Best

limma batchcorrect warnings • 2.8k views

ADD COMMENT • link updated 6.1 years ago by Aaron Lun ★ 28k • written 6.1 years ago by xingxd16 ▴ 20

0

Entering edit mode

Fix your code formatting.
"I found that I can still get the results by ." Well, by what?

A sloppy question deserves a sloppy answer. So don't be sloppy.

ADD REPLY • link 6.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Sure, sorry for that. I have edit my question. Hope you reply !

ADD REPLY • link 6.1 years ago xingxd16 ▴ 20

0

Entering edit mode

I'm not sure that's much better. You know there is specific formatting for code, right? Looks like this:

this_is_some_code
# This is a comment that you should use for code output

Read the tutorial here.

ADD REPLY • link 6.1 years ago Aaron Lun ★ 28k

0

Entering edit mode

Sorry, how about now?

ADD REPLY • link 6.1 years ago xingxd16 ▴ 20

score 3 · Accepted Answer · 2019-03-13

I can not know that the different between control and treat are the biology difference or batch drived. Am I right ?

Correct.

So my question is can I ignore the warnings and still use the results ?

No.

Or in this situation correct batch is wrong ?

Yes. The experimental design was broken to begin with, and the warnings are a manifestation of that. Whenever you have coefficients that cannot be estimated, it usually means that something is confounded with something else. On occasion, this can be worked around, but not in this case. You don't have enough information to estimate all of the coefficients so limma just drops one of them to make the model mathematically solvable. However, removing one coefficient changes the meaning of all other coefficients such that your treatment term no longer represents the average treatment effect. It probably instead represents the difference between LC05 and p3 alone.

If the batches are replicates of each other, I would suggest:

Summing the counts across cells within each batch.
Pretending the summed counts are pseudo-bulk RNA-seq samples.
Analyzing the result in edgeR or voom as a two-group comparison.

There's no need to account for batch, because each batch is a single replicate in a pseudo-bulk analysis; trying to block on each replicate would make no sense. See here for an example of how this could be done.