I've the output of a differential expression analysis for RNA done with cuffdiff and I'm trying to check it with limma but I'm not getting the results that I would expect.
w164 <- read.table("COUNTfile.txt", header = T, row.names = 1) head(w164) WM164_NEG_1 WM164_NEG_2 WM164_NEG_3 WM164_MDK_1 WM164_MDK_2 WM164_MDK_3 A1BG 125 152 145 127 121 114 A1BG-AS1 54 55 58 74 69 94 A1CF 0 0 0 0 0 0 A2LD1 91 85 81 56 58 69 A2M 6368 5305 10631 6560 6017 5833 A2ML1 2 1 8 1 4 4
Then I do try to do de differential expression analysis, I've tried to change the coef (1, 2) and the intercept (0, 1)
dge <- DGEList(counts=w164) dge <- calcNormFactors(dge) logCPM <- cpm(dge, log=TRUE, prior.count=1) classes <- factor(c("NEG","NEG","NEG", "MDK","MDK","MDK")) design <- model.matrix(~ 0 + classes) #colnames(design) <- c("NEG", "MDK") fit <- lmFit(logCPM, design) fit <- eBayes(fit, trend=TRUE) toptable <- topTable(fit, coef=2, number=nrow(fit$p.value))
This is the output of our pipeline which used cuffdiff (just for the two more significant genes).
gene |
sample_1 |
sample_2 |
FPKM_1 |
FPKM_2 |
log2(fold_change) |
test_stat |
q_value |
AASS |
WM164_NEG |
WM164_MDK |
4.10347 |
2.48356 |
-0.724435 |
-2.3669 |
0.000702473 |
ABCB9 |
WM164_NEG |
WM164_MDK |
5.60671 |
10.4515 |
0.898485 |
2.95674 |
0.000702473 |
This is the top table for both genes when coef = 1
logFC AveExpr t P.Value adj.P.Val B AASS 4.366231 4.061036 31.65013 1.372641e-06 2.595976e-06 5.552753 ABCB9 4.292285 4.720068 18.24541 1.725563e-05 2.794191e-05 2.497849
When coef = 2
logFC AveExpr t P.Value adj.P.Val B AASS 3.755841 4.061036 27.22551 2.747115e-06 4.929324e-06 4.717309
ABCB9 5.147851 4.720068 21.88221 7.501512e-06 1.272469e-05 3.50482 |
|
|
What I'm doing wrong?
Thanks
Update, I've modify the classes vector such
Now logFC are quite close to the ones obtained in cuffdiff but none of the top 7 significant genes from cuffdiff are significant in limma.
3157 significant genes in cuffdiff
117 in limma
intersection of 29 genes