I have RNA-seq data that has three groups - Control (C), Disease1 (D1), and Disease2 (D2). The patients of D1 and D2 were treated with a drug (T) and data was generated on pre- and post-treatment disease groups i.e. I have 5 grouping factors - C, D1-pre, D1-post, D2-pre, D2-post. I am a bit confused while modeling this nested data to identify DE genes in limma. I have two questions in mind:
- What are DE genes in D1 and D2 compared to control, and
- Whether the drug has a differential effect on the expression of DE genes in D1 compared to D2.
I tried to make a model matrix as:
library(limma)
patient <- data.frame(
patient$patient <- factor(paste(rep(1:5, times = 5),"_",patient$diagnosis))
diagnosis = factor(rep(c("D1", "D2","Control"), each=10, length.out=25)),
treat = factor(rep(c("Pre", "Post"), each=5, times=3, length.out=25)))
patient$group <- factor(paste(patient$diagnosis,patient$treat))
design <- model.matrix(~0+group,patient)
colnames(design) <- c("C", "D1post", "D1pre", "D2post", "D2pre")
I am lost somewhere here, I am unable to decide if should I go for
fit <- lmFit(eset, design) #eset is the data matrix with expression levels
or make a contrast model, something like:
contrast.matrix <- makeContrasts(D1pre-C, D2pre-C, D1post-D1pre, D2post-D2pre, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
I will be thankful if someone could please help me in modeling.
Do you have pre and post samples from the same patients? In other words, do the D1-pre and D1-post groups contain the same patients or different patients?
Yes, 'pre' and 'post' patients in each disease group are the same. I am sorry for my laziness. I have updated the same in my question. Thank you for replying on Sunday. Now that you have pointed out, can I use these patients' samples per group as random errors?