DESeq fitted values using coefficients vs mu
1
3
Entering edit mode
@matthew-mccall-4459
Last seen 5.6 years ago
United States

I've tried calculating fitted values from the output of DESeq in two ways that I thought should produce identical results; however, this doesn't seem to be the case. 

obj <- DESeq(obj)
q_ij <- t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) )
mod <- attr(obj, "modelMatrix")
coefs <- coef(obj)
q_tst <- coefs %*% t(mod)
summary(as.vector(q_tst-log2(q_ij)))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.8360  0.0000  0.0000 -0.1278  0.0000  0.0000 

plot(x=(log2(q_ij)+q_tst)/2, y=log2(q_ij)-q_tst)

I've also tried this with betaPrior=FALSE and minReplicatesForReplace=Inf and the two fitted value calculations were still not equal. Am I missing something obvious here? In case it's helpful I've uploaded the obj variable from my example code here: https://dl.dropboxusercontent.com/u/384142/deseq2-test-object.rda

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2         gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0 Biobase_2.34.0            
 [6] GenomicRanges_1.26.1       GenomeInfoDb_1.10.1        IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0       
[11] sva_3.22.0                 genefilter_1.56.0          mgcv_1.8-16                nlme_3.1-128              

loaded via a namespace (and not attached):
 [1] gtools_3.5.0         locfit_1.5-9.1       splines_3.3.2        lattice_0.20-34      colorspace_1.3-1     htmltools_0.3.5      survival_2.40-1     
 [8] XML_3.98-1.5         foreign_0.8-67       DBI_0.5-1            BiocParallel_1.8.1   plyr_1.8.4           stringr_1.1.0        zlibbioc_1.20.0     
[15] munsell_0.4.3        gtable_0.2.0         caTools_1.17.1       memoise_1.0.0        latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.52.0  
[22] AnnotationDbi_1.36.0 htmlTable_1.7        Rcpp_0.12.8          KernSmooth_2.23-15   acepack_1.4.1        xtable_1.8-2         openssl_0.9.5       
[29] scales_0.4.1         gdata_2.17.0         base64_2.0           Hmisc_4.0-1          annotate_1.52.0      XVector_0.14.0       gridExtra_2.2.1     
[36] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2        grid_3.3.2           tools_3.3.2          bitops_1.0-6         magrittr_1.5        
[43] RCurl_1.95-4.8       lazyeval_0.2.0       RSQLite_1.1-1        tibble_1.2           Formula_1.2-1        cluster_2.0.5        Matrix_1.2-7.1      
[50] data.table_1.10.0    assertthat_0.1       rpart_4.1-10         nnet_7.3-12     

 

 

 

deseq2 • 2.3k views
ADD COMMENT
4
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Matthew,

If you use betaPrior=FALSE, they should be very close. I get 2e-15 with some simulated data:

library(DESeq2)
set.seed(1)
dds <- makeExampleDESeqDataSet()
dds <- dds[rowSums(counts(dds)) > 0,]
dds <- DESeq(dds, betaPrior=FALSE)
log2q1 <- log2(t(t(assays(dds)[["mu"]])/sizeFactors(dds)))
log2q2 <- coef(dds) %*% t(attr(dds, "modelMatrix"))
dev <- log2q1[,1] - log2q2[,1]
summary(abs(dev))

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.000e+00 0.000e+00 0.000e+00 8.184e-17 0.000e+00 1.776e-15 

> packageVersion("DESeq2")
[1] ‘1.14.0’

I see equal deviations from zero per gene (for a gene, all samples have the same deviation if it's nonzero) which I think corresponds to the intercept changing by small amounts. I think what is going on in terms of the difference is the conversion of coefficients from log scale to log2 scale which happens internally.

ADD COMMENT
1
Entering edit mode

Thank you for the fast reply. So with betaPrior=FALSE, I still get some sizable differences.

obj <- DESeq(obj, betaPrior=FALSE)
log2q1 <- log2(t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) ))
log2q2 <- coef(obj) %*% t(attr(obj, "modelMatrix"))
summary(as.vector(log2q1-log2q2))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.0000  0.0000  0.0000  0.5133  0.0000 34.5000 

Now with betaPrior=FALSE, some rows don't converge. If I remove those, there are only a few differences between the fitted value calculations, but they aren't decimal dust.

ind <- which(mcols(obj)$betaConv)
summary(as.vector(log2q1[ind,]-log2q2[ind,]))

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.000000 0.000000 0.000000 0.000895 0.000000 1.320000 

One thing I noticed is that the differences are only in rows that converged in beta but took the maximum iterations to do so:

which(log2q1[ind,]-log2q2[ind,] > 0.01, arr.ind=TRUE)

              row col
hsa-let-7b-3p   3 424
hsa-let-7b-3p   3 426
hsa-let-7b-3p   3 435
hsa-let-7a-3p   1 436
hsa-let-7b-3p   3 436
hsa-let-7b-3p   3 437
hsa-let-7b-3p   3 438

mcols(obj)[ind,326:327]

DataFrame with 7 rows and 2 columns
   betaConv  betaIter
  <logical> <numeric>
1      TRUE       100
2      TRUE         6
3      TRUE       100
4      TRUE        17
5      TRUE        21
6      TRUE         5
7      TRUE        15
ADD REPLY
1
Entering edit mode

Ah, I think that's because I'm not doing a final update for mu for those rows that undergo numeric optimization because they don't converge with the IRLS. So the better mu for those rows is the one you are creating by multiplying coefficients. I can fix this in devel.

ADD REPLY
1
Entering edit mode

Ok, I've fixed this in v1.15.25. Now the differences should be within some typical numeric tolerance.

ADD REPLY
0
Entering edit mode

Got it. Thanks!

ADD REPLY
1
Entering edit mode

One related question: is there a way to calculate the fitted values with betaPrior=TRUE using the coefficients and model matrix rather than assays(obj)[["mu"]] and the size factors?

ADD REPLY
1
Entering edit mode

If you want the "shrunken" mu's (so they will be closer to each other than the MLE mu's), you can just multiply the coef matrix by the model matrix and scale up by size factors. I took a look at the mu code just now and made it a bit more consistent (it was complicated and somewhat inconsistent by my having allowed users to run functions without necessarily having gone through expected previous steps, e.g. users providing external dispersion values instead of using estimateDispersions...).

Going forward, the matrix "mu" in assays(dds) will always be the MLE mu's.

ADD REPLY
0
Entering edit mode

Great. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6