Question

DESeq2 input data.

0

Entering edit mode

Fischer-philipp ▴ 30

@fischer-philipp-18490

Last seen 2.9 years ago

Austria

Hello, I did not find anything related.

Consider a setup with 5 Conditions A-E, where E is the untreated condition and hence the reference label.

I am Interested in (A/B)/(C/D) and (C/D)/(E/F)

coldata
        condition
    A_1 "A"      
    A_2 "A"      
    A_3 "A"      
    B_1 "B"      
    B_2 "B"      
    B_3 "B"      
    C_1 "C"      
    C_2 "C"      
    C_3 "C"      
    D_1 "D"      
    D_2 "D"      
    D_3 "D"      
    E_1 "E"      
    E_2 "E"      
    E_3 "E"
    F_1 "F"
    F_2 "F"
    F_3 "F"


    dds_f <- DESeqDataSetFromMatrix(countData = data_f, 
                                    colData = coldata,
                                    design = ~ condition)
    dds_f <- DESeq(dds_f)
    res_1 <- results(dds_f, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))
    res_2  <- results(dds_f, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))

   coldata_g <- coldata[grepl(coldata$condition, pattern = "A|B|C|D"), ]
   data_g <- data_f[ ,grepl(colnames(data_f), pattern = "A|B|C|D")]
   dds_g <- DESeqDataSetFromMatrix(countData = data_g, 
                                    colData = coldata_g,
                                    design = ~ condition)
    dds_g <- DESeq(dds_g)
    res_3 <- results(dds_g, contrast = list(c("conditionA". "conditionD"), c("conditionB", "conditionC")))

   coldata_g2 <- coldata[grepl(coldata$condition, pattern = "C|D|E|F"), ]
   data_g2 <- data_f[ ,grepl(colnames(data_f), pattern = "C|D|E|F")]
   data_g2 <- DESeqDataSetFromMatrix(countData = data_g2, 
                                    colData = coldata_g2,
                                    design = ~ condition)
    data_g2 <- DESeq(data_g2)
    res_4 <- results(data_g2, contrast = list(c("conditionC". "conditionF"), c("conditionD", "conditionE")))

My questions are:

1.) Is it better to use all samples respectively res1 and res2 or to use res3 and res4?

My guess is that the answer is not clear since some parameter (e.g. dispersion estimates, normalization factor) are estimate gene-wise or at least are dependent on all values within the rows.

Setup 1 provides more information which also can be misleading - dependent on the kind of conditions. Setup 2 provides specific, but less information.

So a thumb-rule could be that setup 1 is more conservative than setup 2?

Thank you in advance.

DESeq2 Normalization Input data • 517 views

ADD COMMENT • link 5.3 years ago Fischer-philipp ▴ 30

score 1 · Answer 1 · 2020-01-09

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

This is one of the FAQ, basically what is the advantage of additional data not used in a contrast. Check there first and come back if you have further questions.

ADD COMMENT • link 5.3 years ago Michael Love 43k

score 0 · Answer 2 · 2020-01-09

0

Entering edit mode

Fischer-philipp ▴ 30

@fischer-philipp-18490

Last seen 2.9 years ago

Austria

section:

If I have multiple groups, should I run all together or split into pairs of groups?

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD COMMENT • link 5.3 years ago Fischer-philipp ▴ 30