DESeq2 step-by-step Normalization then DGE separately & Normalization methods and switchability
1
0
Entering edit mode
RL • 0
@d976f52f
Last seen 6 months ago
France

Hi everyone,

Sorry for the inconvenience, I was hoping that Michael Love in particular could help me. On the same dds DESeq2 object, I performed a DGEAnalysis with DESeq2, through 2 approachs :

  • One all-steps-in-one
    dds <- DESeq(dds)
    
  • and one step-by-step dissected
    dds <- estimateSizeFactors(dds)
    # normcnt <- counts(dds, normalized=TRUE)
    # write.csv(normcnt, file = "normcnt.csv", row.names = TRUE)
    dds <- estimateDispersions(dds)
    dds <- nbinomWaldTest(dds)
    
    (even if longer, I prefer a step-by-step approach in my pipeline context mainly as you can see because I can easily extract the normalized table without redoing that computing).

But I do not obtain exactly the same result analysis. With step-by-step, Log2FoldChanges seem to be the same, but padj and others are not, for each gene, exactly the same, and I have very slightly less DE genes.

Is my step-by-step not strictly equivalent to DESeq() ? If so, do you see what to modify ?

And (if you have the time) another question, what is officially the default DESeq2 normalization method name, Means of Ratio Method, Size Factors Method ? And if others available in DESeq2 (I understand that), is there a list (I didn't find), with formulas, and how to set it in parameter ? For example :

dds <- estimateSizeFactors(dds, normalizationMethod="TMM")# ?

Thank you for your time and your help.

DESeq2 Normalization • 885 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I can't reproduce.

> z <- makeExampleDESeqDataSet()
> dds <- DESeq(z)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> dds2 <- estimateSizeFactors(z)
> dds2 <- estimateDispersions(dds2)
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> dds2 <- nbinomWaldTest(dds2)
> all.equal(results(dds), results(dds2))
[1] TRUE
>
0
Entering edit mode

Thank you very much for your quick cross-check help.

I have redone, and sadly unchanged.

baseMean log2FoldChange lfcSE stat pvalue padj

257.3359592 -0.70613952 0.082681159 -8.540513101 1.34E-17 1.91E-13

1187.186642 -0.717965588 0.098872298 -7.261544441 3.83E-13 2.73E-09

82.09032957 3.941975841 0.551440677 7.148503914 8.77E-13 4.18E-09

688.0855115 0.60537706 0.090809498 6.666450939 2.62E-11 9.36E-08

636.4854582 -0.735240103 0.11340724 -6.483184884 8.98E-11 2.14E-07

...

614.5689018 -0.165047271 0.061945113 -2.664411485 0.007712316 0.485337531

1849.221224 0.122149957 0.045839434 2.664735273 0.007704895 0.485337531

25.17306252 0.598317756 0.224820501 2.661313146 0.007783653 0.487604891

4736.526392 -0.37558073 0.141207776 -2.659773703 0.007819317 0.487700014   






baseMean log2FoldChange lfcSE stat pvalue padj

257.3359592 -0.70613952 0.082681159 -8.540513101 1.34E-17 1.90E-13

1187.186642 -0.717965588 0.098872298 -7.261544441 3.83E-13 2.73E-09

82.09032957 3.941975841 0.551440677 7.148503914 8.77E-13 4.17E-09

688.0855115 0.60537706 0.090809498 6.666450939 2.62E-11 9.33E-08

636.4854582 -0.735240103 0.11340724 -6.483184884 8.98E-11 2.13E-07

...

614.5689018 -0.165047271 0.061945113 -2.664411485 0.007712316 0.499377638

1849.221224 0.122149957 0.045839434 2.664735273 0.007704895 0.499377638  

25.17306252 0.598317756 0.224820501 2.661313146 0.007783653 0.501640584

4736.526392 -0.37558073 0.141207776 -2.659773703 0.007819317 0.501669056

So it is

  • either, anomaly in my dataset in comparison to internal example dataset makeExampleDESeqDataSet(),
  • or, an influence of the DESeq matrix object definition before the following DGE analysis (even if I strictly used the same counts table),
     dds <- DESeqDataSetFromMatrix(countData = counts,
                                   colData = coldata,
                                   design = ~ condition)
    
  • (or, mystery ...).
ADD REPLY
0
Entering edit mode

The only difference I can see is in the adjusted p-values, and those will only change (given the same input p-values) if the number of tests vary.

ADD REPLY
1
Entering edit mode

DESeq() also does refitting of outliers. Try minRep=Inf to turn that off.

ADD REPLY
0
Entering edit mode

Wonderful ! It works. So :

dds <- DESeq(dds, minRep=Inf)

I suppose that if you configured by default DESeq() with that minRep as not Inf, it is a good thing and must be kept, so, to keep this default setting, as you said elsewhere, "[DESeq()] uses the pre-existing sizeFactors(dds) however they were created (or simply assigned).", I can simply do

dds <- estimateSizeFactors(dds)
# Here eventually extract by-DESeq2 normalized table
dds <- DESeq(dds)# (automatically not redoing estimateSizeFactors)
# Here extract DGE results

?

Is it possible to set that minRep= not Inf manually as a step (function or parameter), in this context :

dds <- estimateSizeFactors(dds)
# Export normalized table
# ...
# Import normalized table and perform DGE analysis with minRep= NOT Inf by :
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

(not found in documented parameters of these functions) ?

Thank you very much for your help, I m a student so sorry if too evident question, I will avoid to bother you more.

Have a good day.

ADD REPLY
0
Entering edit mode

The thing it is doing is not part of the sub functions.

I don't love the outlier replacement but we leave it there for consistency. I often turn it off myself for large datasets and use manual inspection.

ADD REPLY

Login before adding your answer.

Traffic: 375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6