DESeq2 input data.
2
0
Entering edit mode
@fischer-philipp-18490
Last seen 2.1 years ago
Austria

Hello, I did not find anything related.

Consider a setup with 5 Conditions A-E, where E is the untreated condition and hence the reference label.

I am Interested in (A/B)/(C/D) and (C/D)/(E/F)

coldata
        condition
    A_1 "A"      
    A_2 "A"      
    A_3 "A"      
    B_1 "B"      
    B_2 "B"      
    B_3 "B"      
    C_1 "C"      
    C_2 "C"      
    C_3 "C"      
    D_1 "D"      
    D_2 "D"      
    D_3 "D"      
    E_1 "E"      
    E_2 "E"      
    E_3 "E"
    F_1 "F"
    F_2 "F"
    F_3 "F"


    dds_f <- DESeqDataSetFromMatrix(countData = data_f, 
                                    colData = coldata,
                                    design = ~ condition)
    dds_f <- DESeq(dds_f)
    res_1 <- results(dds_f, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))
    res_2  <- results(dds_f, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))

   coldata_g <- coldata[grepl(coldata$condition, pattern = "A|B|C|D"), ]
   data_g <- data_f[ ,grepl(colnames(data_f), pattern = "A|B|C|D")]
   dds_g <- DESeqDataSetFromMatrix(countData = data_g, 
                                    colData = coldata_g,
                                    design = ~ condition)
    dds_g <- DESeq(dds_g)
    res_3 <- results(dds_g, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))

   coldata_g2 <- coldata[grepl(coldata$condition, pattern = "C|D|E|F"), ]
   data_g2 <- data_f[ ,grepl(colnames(data_f), pattern = "C|D|E|F")]
   data_g2 <- DESeqDataSetFromMatrix(countData = data_g2, 
                                    colData = coldata_g2,
                                    design = ~ condition)
    data_g2 <- DESeq(data_g2)
    res_4 <- results(data_g2, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))

My questions are:

1.) Is it better to use all samples respectively res1 and res2 or to use res3 and res4?

My guess is that the answer is not clear since some parameter (e.g. dispersion estimates, normalization factor) are estimate gene-wise or at least are dependent on all values within the rows.

Setup 1 provides more information which also can be misleading - dependent on the kind of conditions. Setup 2 provides specific, but less information.

So a thumb-rule could be that setup 1 is more conservative than setup 2?

Thank you in advance.

DESeq2 Normalization Input data • 395 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

This is one of the FAQ, basically what is the advantage of additional data not used in a contrast. Check there first and come back if you have further questions.

ADD COMMENT

Login before adding your answer.

Traffic: 1121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6