Hello, I did not find anything related.
Consider a setup with 5 Conditions A-E, where E is the untreated condition and hence the reference label.
I am Interested in (A/B)/(C/D) and (C/D)/(E/F)
coldata
condition
A_1 "A"
A_2 "A"
A_3 "A"
B_1 "B"
B_2 "B"
B_3 "B"
C_1 "C"
C_2 "C"
C_3 "C"
D_1 "D"
D_2 "D"
D_3 "D"
E_1 "E"
E_2 "E"
E_3 "E"
F_1 "F"
F_2 "F"
F_3 "F"
dds_f <- DESeqDataSetFromMatrix(countData = data_f,
colData = coldata,
design = ~ condition)
dds_f <- DESeq(dds_f)
res_1 <- results(dds_f, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))
res_2 <- results(dds_f, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))
coldata_g <- coldata[grepl(coldata$condition, pattern = "A|B|C|D"), ]
data_g <- data_f[ ,grepl(colnames(data_f), pattern = "A|B|C|D")]
dds_g <- DESeqDataSetFromMatrix(countData = data_g,
colData = coldata_g,
design = ~ condition)
dds_g <- DESeq(dds_g)
res_3 <- results(dds_g, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))
coldata_g2 <- coldata[grepl(coldata$condition, pattern = "C|D|E|F"), ]
data_g2 <- data_f[ ,grepl(colnames(data_f), pattern = "C|D|E|F")]
data_g2 <- DESeqDataSetFromMatrix(countData = data_g2,
colData = coldata_g2,
design = ~ condition)
data_g2 <- DESeq(data_g2)
res_4 <- results(data_g2, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))
My questions are:
1.) Is it better to use all samples respectively res1 and res2 or to use res3 and res4?
My guess is that the answer is not clear since some parameter (e.g. dispersion estimates, normalization factor) are estimate gene-wise or at least are dependent on all values within the rows.
Setup 1 provides more information which also can be misleading - dependent on the kind of conditions. Setup 2 provides specific, but less information.
So a thumb-rule could be that setup 1 is more conservative than setup 2?
Thank you in advance.