Hi all :
- When I use limma to identify differentrial expression genes , I want to correct batch effect also.
- My command and data information is :
> table(cluster)
cluster
control treat
368 957
> table(Exp)
Exp
p3 p4 LC05 LC09 LC10 LC11 LC12
370 587 21 210 19 69 49
> table(cluster,Exp)
Exp
cluster p3 p4 LC05 LC09 LC10 LC11 LC12
control 0 0 21 210 19 69 49
treat 370 587 0 0 0 0 0
> design <- model.matrix(~ cluster+Exp)
> fit <- lmFit(selectExpressionSetObject, design)
Coefficients not estimable: ExpLC12
Warning message:
Partial NA coefficients for 6399 probe(s)
> fit <- eBayes(fit, trend=TRUE)
Warning messages:
1: In out$var.prior[is.na(out$var.prior)] <- 1/out$s2.prior :
被替换的项目不是替换值长度的倍数(*Chinese here should translate to The replaced item is not a multiple of the length of the replacement value*)
2: In .ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :
Estimation of var.prior failed - set to default value
> res <- decideTests(fit, p.value=adjPvalueCutoff, lfc = logFCcutoff)
> summary(res)
(Intercept) clustertreat Expp4 ExpLC05 ExpLC09 ExpLC10 ExpLC11
Down 0 106 63 0 21 4 9
NotSig 5239 6259 6151 6387 6344 6374 6356
Up 1160 34 185 12 34 21 34
ExpLC12
Down 0
NotSig 0
Up 0
>
> DEgeneSets <- topTable(fit, coef="clustertreat", number=Inf,
+ p.value=adjPvalueCutoff, adjust="BH", lfc = logFCcutoff)
> head(DEgeneSets)
gene_short_name Database logFC AveExpr t P.Value
SFTPC SFTPC emsembl -1.740316 0.4582606 -19.04131 4.347876e-72
SCGB3A1 SCGB3A1 emsembl -1.336138 0.5532937 -17.45647 6.572995e-62
H3F3A H3F3A emsembl -1.706766 0.6291855 -16.61615 9.704449e-57
HSPA1A HSPA1A emsembl 2.857433 1.6038324 14.99185 3.061836e-47
ZFP36L2 ZFP36L2 emsembl -2.080686 0.7430561 -14.98351 3.411612e-47
HSPA1B HSPA1B emsembl 1.978068 0.9748275 12.42446 1.060178e-33
adj.P.Val B
SFTPC 2.782206e-68 151.97484
SCGB3A1 2.103030e-58 129.01848
H3F3A 2.069959e-53 117.36082
HSPA1A 4.366181e-44 95.94128
ZFP36L2 4.366181e-44 95.83537
HSPA1B 1.130679e-30 65.43102
- I know the reason may that the control samples are in 2 batch , the treat samples are in another 5 batch. So my matrix's columns are dependent and this causes the technical problem. I can not know that the different between control and treat are the biology difference or batch drived. Am I right ?
- Even there are warnings, I found that I can still get the results by " topTable(fit, coef="clustertreat") " see above . So my question is can I ignore the warnings and still use the results ? Or in this situation correct batch is wrong ? And I indeed found that some genes are logFC >0 without batch correct, but it turn to logFC <0 after batch correct .
Best
A sloppy question deserves a sloppy answer. So don't be sloppy.
Sure, sorry for that. I have edit my question. Hope you reply !
I'm not sure that's much better. You know there is specific formatting for code, right? Looks like this:
Read the tutorial here.
Sorry, how about now?