I would like to compare the DE genes between Treatment (T) vs Control (C). The Treatment and controls have subgroups ( which are minor variants of treatments/controls ), so I would expect some genes to show differential expression between different subgroups of treatment/control, but I am only interested in genes that show DE between T and C. I want to eliminate any genes that show DE only between subgroups. i.e If a gene is expressed in T1 and T2 but not in T3, I would like to skip those genes from DE analysis.
|
|
T |
T1 |
T |
T1 |
T |
T1 |
T |
T2 |
T |
T2 |
T |
T2 |
T |
T3 |
T |
T3 |
T |
T3 |
C |
C1 |
C |
C1 |
C |
C1 |
C |
C2 |
C |
C2 |
C |
C2 |
What I am planning to do is the following and I would like to know if its the correct approach.
treat <- c(rep(c("T"),each=9), rep(c("C"),each=6))
sub <- c(rep(c("T1"),each=3),rep(c("T2"),each=3),rep(c("T3"),each=3),rep(c("C1"),each=3),rep(c("C2"),each=3))
design <- model.matrix(~sub+treat)
y <- DGEList(counts=counts_data, group=treat)
y <- calcNormFactors(y)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
note: I edited away the DESeq tags because it looks like this is a question about how to do an edgeR analysis.