Question

Error in DESeq2 Unmix() function: L-BFGS-B needs finite values of 'fn'

0

Entering edit mode

JennyS • 0

@jennylsmith12-16139

Last seen 5 months ago

United States

Hello,

I wanted to use the unmix() function from DESeq2 to investigate the mixing proportions of normal and tumor from bulk tumor RNAseq. I was hoping to compare the results from using this function with A) normalized counts from the median ratio method and B) Transcripts per million (TPM).

First when I run the function using the median ratio normalized counts, I found that I receive the error, "Error in optim(par = rep(1, ncol(pure)), fn = distVST, gr = NULL, i, vst, : L-BFGS-B needs finite values of 'fn'".

Code that produced the error:

dds <- DESeqDataSetFromMatrix(countData = round(cts, digits = 0),
                              colData = blasts.anno,
                              design = ~1)

dds <- dds[rowSums(counts(dds)) > 10, ]
dim(dds)

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds, fitType="parametric") #asymptDisp: 0.5999724

dispersionFunction(dds)

norm.cts.dds <- counts(dds, normalized=TRUE)

aml.dds <- norm.cts.dds[,grep("^PA", colnames(norm.cts.dds))]
nbm.dds <- norm.cts.dds[,grep("^RO|^BM", colnames(norm.cts.dds))]

unmix.dds <- unmix(x=aml.dds, pure = nbm.dds, alpha=0.5999724)



traceback():

Error in optim(par = rep(1, ncol(pure)), fn = distVST, gr = NULL, i, vst, : L-BFGS-B needs finite values of 'fn'

5.	optim(par = rep(1, ncol(pure)), fn = distVST, gr = NULL, i, vst, alpha, loss, method = "L-BFGS-B", lower = 0, upper = 100)

4.	FUN(X[[i]], ...)

3.	lapply(X[Split[[i]]], FUN, ...)

2.	lapply(seq_len(ncol(x)), function(i) { optim(par = rep(1, ncol(pure)), fn = distVST, gr = NULL, i, vst, alpha, loss, method = "L-BFGS-B", lower = 0, upper = 100)$par ...

1.	unmix(x = aml.dds, pure = nbm.dds, alpha = 0.5999724)

Does anyone have any suggestions about where this error might be originating from? I updated all my bioconductor packages and tried to run it again, but it failed with the same error.

Also, should I be treating the tumors and the normals separately? Here you can see that I normalized the entire matrix of the tumors and the pure normals together, and then seperated them only for the unmix() function to run. Is that appropriate? My thoughts on doing it this way is that this ensure the same number of rows for both tumor and normals since rowSums(counts(dds)) > 10 would be different between the tumors and normals if I treated them as separate count matrices.

To use the function with a TPM matrix, my question is how to interpret the "shift" parameter that is required. I used the vsn::meanSdPlot() function, but I am unsure of what values are meant to be used in the "shift". Shift is defined as, "for TPMs, the shift which approximately stabilizes the variance of log shifted TPMs. Can be assessed with vsn::meanSdPlot" in the documentation.

Any advice would be appreciated and thank you very much,

Jenny Smith

sessionInfo()

R version 3.4.0 (2017-04-21)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 14.04.5 LTS

Matrix products: default

BLAS/LAPACK: /app/easybuild/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib/libopenblas_prescottp-r0.2.18.so

locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8   

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                

[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

[1] DESeq2_1.20.0              SummarizedExperiment_1.6.5 DelayedArray_0.2.7         matrixStats_0.53.1       

[5] Biobase_2.36.2             GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        IRanges_2.10.5           

[9] S4Vectors_0.14.7           BiocGenerics_0.22.1        bindrcpp_0.2.2             pryr_0.1.4               

[13] edgeR_3.18.1               limma_3.32.10              tidyr_0.6.1                tibble_1.4.2             

[17] dplyr_0.7.5                ggplot2_2.2.1              magrittr_1.5               stringr_1.3.1            

[21] knitr_1.20                

loaded via a namespace (and not attached):

[1] vsn_3.44.0              bit64_0.9-7             splines_3.4.0           Formula_1.2-3         

[5] assertthat_0.2.0        affy_1.54.0             latticeExtra_0.6-28     blob_1.1.1            

[9] GenomeInfoDbData_0.99.0 yaml_2.1.19             RSQLite_2.1.1           pillar_1.2.3          

[13] backports_1.1.2         lattice_0.20-35         glue_1.2.0              digest_0.6.15         

[17] RColorBrewer_1.1-2      XVector_0.16.0          checkmate_1.8.5         colorspace_1.3-2      

[21] htmltools_0.3.5         preprocessCore_1.38.0   Matrix_1.2-14           plyr_1.8.4            

[25] XML_3.98-1.11           pkgconfig_2.0.1         pheatmap_1.0.8          genefilter_1.58.1     

[29] zlibbioc_1.22.0         xtable_1.8-2            purrr_0.2.5             scales_0.4.1          

[33] affyio_1.46.0           BiocParallel_1.10.1     annotate_1.54.0         htmlTable_1.9         

[37] pbapply_1.3-4           nnet_7.3-12             lazyeval_0.2.1          survival_2.41-3       

[41] memoise_1.1.0           foreign_0.8-70          BiocInstaller_1.26.1    data.table_1.11.4     

[45] tools_3.4.0             munsell_0.5.0           locfit_1.5-9.1          cluster_2.0.7-1       

[49] AnnotationDbi_1.38.2    compiler_3.4.0          rlang_0.2.1             grid_3.4.0            

[53] RCurl_1.95-4.10         htmlwidgets_1.2         bitops_1.0-6            base64enc_0.1-3       

[57] gtable_0.2.0            codetools_0.2-15        DBI_1.0.0               R6_2.2.2              

[61] gridExtra_2.3           bit_1.1-14              bindr_0.1.1             Hmisc_4.0-2           

[65] stringi_1.2.3           Rcpp_0.12.17            geneplotter_1.54.0      rpart_4.1-13          

[69] acepack_1.4.1           tidyselect_0.2.4

deseq2 biocViews • 1.4k views

ADD COMMENT • link 6.9 years ago • updated 5 months ago JennyS • 0

score 0 · Answer 1 · 2018-06-14

What is the correlation on log scale of the "pure" samples. E.g. you can do: cor(assay(vsd)) if vsd is an object with just the pure samples subsetted after running vst() on the dds. Having very correlated pure samples, even two of them, makes the optimization not converge, which makes sense: you can use X of one of the pure samples, or X of the other, of 1/2 X of both, all of these are then equally good at explaining the data.

If I understand you correctly, it should be fine to do the normalization and filtering on all the samples. unmix() then looks one sample at a time, totally independently. However, if there are large differences in the samples like tumor vs normal, this should be in the design so that the dispersions are properly estimated.

Re: shift, you can see how log2(x + shift) looks in the meanSdPlot. You want to stabilize the trend of most of the genes (not all the genes will be flat, this is ok, the important thing is the trend.)