DESeq2 - v1.18 different p-values from v1.20 (Is it minmu?)
1
0
Entering edit mode
@andrebolerbarros-16788
Last seen 6 months ago
Portugal

Hello everyone,

I ran a RNASeq pipeline analysis previously with DESeq2 version 1.18; now, with version 1.20, I am getting different p-values that, ultimately influences the number of DE genes (~200 different genes).

I was reading the changes about the changes in versions and the only change I think could be involved is the "minmu" argument of DESeq, which was not present in 1.18. Can this be the answer?

Thanks!

deseq2 • 1.7k views
ADD COMMENT
1
Entering edit mode

Have a look here, there is a reference to that parameter.

https://github.com/mikelove/DESeq2/blob/a24c0bd71fb4a621a7c0772ca00825db5af5c69b/NEWS#L26

 

ADD REPLY
0
Entering edit mode
Thanks for your answer! I saw that, that's how I got to the hypothesis of minmu. But, I would like to understand the change or to get the default parameter in v1.18
ADD REPLY
0
Entering edit mode

Just started doing some experiments and it seems to be "minmu" indeed that is changing this values. Is there any information about the previous value and/or any information concerning the change?

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 10 hours ago
United States

I didn’t change the value of minmu (it was 0.5 before and after), I only elevated it to an argument, rather than an internal parameter. It was exposed for the single cell integration.

So that’s not the cause of any changes in pvalues. I can’t think of any difference between these versions. Can you give summary(abs(res$stat - res2$stat)) ?

ADD COMMENT
0
Entering edit mode

I happen to have both versions on my laptop, and I get at most 2 x 10^-5 differences in adjusted p-value. I don't think there was any relevant change in the statistical routine between these versions (there was a bug in the single cell integration that I fixed, but you didn't mention using ZINB-WaVE estimated weights).

Is it possible you were using a version earlier than 1.18?

R 3.4:

> packageVersion("DESeq2")
[1] ‘1.18.1’
> set.seed(1)
> dds <- makeExampleDESeqDataSet()
> dds <- DESeq(dds, quiet=TRUE)
> res <- results(dds)
> save(dds, res, file="deseq2_v1.18.rda")

R 3.5:

> packageVersion("DESeq2")
[1] ‘1.20.0’
> load("deseq2_v1.18.rda")
> dds <- DESeq(dds, quiet=TRUE)
> res2 <- results(dds)
> summary(abs(res$stat - res2$stat))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
0.00e+00 6.00e-07 2.00e-06 3.00e-06 4.30e-06 2.23e-05        2 
> summary(abs(res$pvalue - res2$pvalue))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.0e+00 3.0e-07 1.0e-06 1.2e-06 1.8e-06 5.3e-06       3 
> summary(abs(res$padj - res2$padj))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.0e+00 2.0e-07 2.0e-07 5.0e-07 2.0e-07 2.3e-05       3 

 

ADD REPLY
0
Entering edit mode

So, I did what you suggested (with some additions). I had the environment made with v1.18, so I performed the v1.20 analysis and then compared.

First, by comparing the values of the columns:

check1<-vector()
for (i in 1:ncol(res_20)) {
+   check1[i]<-all(res_18[,i]==res_20[,i],na.rm = T)
+ }
 
check1
[1]  TRUE FALSE FALSE FALSE FALSE FALSE

Now, as suggested, I performed the difference between the results

(First of all, check if the rownames are exactly matched)

check2<-all(rownames(res_18) == rownames(res_20))
check2
[1] TRUE

Now, the difference:

for (i in 1:nrow(res_18)) {
+   dif[i,]<-res_20[i,]-res_18[i,]
+ }

dif<-dif[order(dif$padj,decreasing = T),]
head(dif)
                   baseMean log2FoldChange        lfcSE          stat       pvalue
ENSMUSG00000096992        0   5.169783e-04 1.687938e-03  0.0040201034 2.019885e-03
ENSMUSG00000029190        0   5.193889e-08 2.530548e-05 -0.0001695293 9.630470e-05
ENSMUSG00000112846        0  -5.263351e-06 1.690702e-04  0.0001355298 7.715936e-05
ENSMUSG00000083431        0   5.882076e-05 3.126852e-04  0.0002190303 1.229127e-04
ENSMUSG00000081093        0  -9.029549e-06 1.883606e-04 -0.0001652798 9.436838e-05
ENSMUSG00000093405        0  -1.054840e-05 1.083485e-04 -0.0001479438 8.400461e-05
                         padj
ENSMUSG00000096992 0.04924934
ENSMUSG00000029190 0.04891005
ENSMUSG00000112846 0.04887932
ENSMUSG00000083431 0.04887640
ENSMUSG00000081093 0.04887206
ENSMUSG00000093405 0.04886462

As you can see, there is big differences in the adjusted p-value, which can have clear differences between the versions.

Next step is to re-do everything as you did (R 3.4; DESeq2 v1.18 - version I used the first time) and then compare to the current version I have (R 3.5; DESeq2 v1.20)

ADD REPLY
0
Entering edit mode

I just re-did the analysis with different versions (R3.4; DESeq2 v1.18.1 versus R3.5; DESeq2 v.1.20.0) and the difference remains

ADD REPLY
0
Entering edit mode

As you requested:

summary(dif$stat)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-4.576e-02 -8.796e-05  0.000e+00 -1.383e-05  6.629e-05  5.963e-02 

and also:

summary(dif$padj)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.000   0.009   0.017   0.034   0.049    9788 
ADD REPLY
0
Entering edit mode

Note the pvalues are only different by at most .002 right? I’m surprised this aggregates to such a big difference in adjusted pvalues, but it’s possible because of the nature of the method.

Moving forward, the results are not supposed to be identical across versions. Unless there was a regression, which I don’t think there was, why don’t you stick with one version for the analysis?

ADD REPLY

Login before adding your answer.

Traffic: 581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6