Question

DESeq fitNbinomGLMs Warning Issue: In fitNbinomGLMs(objectNZ[fitidx, , drop = FALSE], alpha_hat = alpha_hat[fitidx], : 4189rows had non-positive estimates of variance for coefficients

0

Entering edit mode

mila ▴ 10

@b0e845cb

Last seen 9 months ago

Germany

Hi dear community,

I am new to bioinformatics and RNAseq analysis and trying to solve an issue with an RNA dataset. This dataset consists of 3 groups, each representing a disease (D1, D2 and D3) and a control group CTRL. For 2 disease groups (D1 and D2) data for two treatments are available (MTX and Ada) while the D3 group only contains data for the MTX treatment due to lack of samples. The samples were taken before therapy and after therapy. Samples for the control group were only taken once at the start of the study though.

What should be tested:

a) The healthy patients against each D-group (disease) before treatment for each treatment (Healthy vs DX-pre-MTX & Healthy vs DX-pre-Ada)

b) The healthy patients against each D-group after treatment for each treatment (Healthy vs DX-post-MTX & Healthy vs DX-post-Ada)

c) Each D-group before treatment vs the same D-group after treatment (DX-pre-MTX vs DX-post-MTX & DX-pre-Ada vs DX-post-Ada)


     Patient TimepointProcessed Sex Disease Treatment TreatmentGroup
1  Healthy 1                  l   w    CTRL      None        Healthy
2  Healthy 2                  l   w    CTRL      None        Healthy
3  Healthy 3                  s   m    CTRL      None        Healthy
4  Healthy 4                  l   w    CTRL      None        Healthy
5   D1-MTX 1                  s   w      D1    preMTX       D1preMTX
6   D1-MTX 2                  l   m      D1    preMTX       D1preMTX
7   D1-MTX 3                  s   w      D1    preMTX       D1preMTX
8   D1-MTX 1                  s   w      D1   postMTX      D1postMTX
9   D1-MTX 2                  l   m      D1   postMTX      D1postMTX
10  D1-MTX 3                  s   w      D1   postMTX      D1postMTX
11  D1-Ada 1                  l   w      D1    preAda       D1preAda
12  D1-Ada 2                  l   m      D1    preAda       D1preAda
13  D1-Ada 3                  s   w      D1    preAda       D1preAda
14  D1-Ada 4                  s   w      D1    preAda       D1preAda
15  D1-Ada 5                  s   w      D1    preAda       D1preAda
16  D1-Ada 6                  s   w      D1    preAda       D1preAda
17  D1-Ada 1                  l   w      D1   postAda      D1postAda
18  D1-Ada 2                  s   m      D1   postAda      D1postAda
19  D1-Ada 5                  l   w      D1   postAda      D1postAda
20  D1-Ada 6                  s   w      D1   postAda      D1postAda
21  D2-MTX 1                  s   w      D2    preMTX       D2preMTX
22  D2-MTX 2                  s   w      D2    preMTX       D2preMTX
23  D2-MTX 3                  s   w      D2    preMTX       D2preMTX
24  D2-MTX 1                  l   w      D2   postMTX      D2postMTX
25  D2-MTX 2                  s   w      D2   postMTX      D2postMTX
26  D2-MTX 3                  l   w      D2   postMTX      D2postMTX
27  D2-Ada 1                  s   m      D2    preAda       D2preAda
28  D2-Ada 3                  s   w      D2    preAda       D2preAda
29  D2-Ada 4                  s   m      D2    preAda       D2preAda
30  D2-Ada 1                  s   m      D2   postAda      D2postAda
31  D2-Ada 2                  s   w      D2   postAda      D2postAda
32  D2-Ada 3                  s   w      D2   postAda      D2postAda
33  D2-Ada 4                  s   m      D2   postAda      D2postAda
34  D3-MTX 1                  s   w      D3    preMTX       D3preMTX
35  D3-MTX 2                  s   m      D3    preMTX       D3preMTX
36  D3-MTX 4                  s   w      D3    preMTX       D3preMTX
37  D3-MTX 1                  l   w      D3   postMTX      D3postMTX
38  D3-MTX 2                  l   m      D3   postMTX      D3postMTX
39  D3-MTX 5                  s   m      D3   postMTX      D3postMTX

It was shown that the sex and timepoint at which the samples were processed (s=shortly after collection, l=the next day), show batch effects on the data and need to be taken into account. I wanted to start with the comparison of a) and b). After reading through the vingrette and some examples posted before, I concatenated the Disease and Treatment columns to one to use TreatmentGroup as one factor in the design. Apparently I am missing something because I get the warnings for 'In fitNbinomGLMs ...' and the ' 3 rows did not converge in beta, labelled in mcols(object)$betaConv' notice.


# Create DESeq object
CD4_dds <- DESeqDataSetFromMatrix(countData = CD4_counts, colData = CD4_coldata, design = ~ TimepointProcessed + Sex + TreatmentGroup)

# Assign control group as reference level
CD4_dds$TreatmentGroup <- relevel(CD4_dds$TreatmentGroup, "Healthy")

nrow(CD4_dds)
smallestGroupSize <- 3 
keep <- rowSums(counts(CD4_dds) >= 10) >= smallestGroupSize
CD4_dds <- CD4_dds[keep,]
nrow(CD4_dds)

CD4_DESeq <- DESeq(CD4_dds, test='Wald', fitType = 'local')

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
3 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
Warning messages:
1: In fitNbinomGLMs(objectNZ[fitidx, , drop = FALSE], alpha_hat = alpha_hat[fitidx],  :
  4337rows had non-positive estimates of variance for coefficients
2: In fitNbinomGLMs(objectNZ, betaTol = betaTol, maxit = maxit, useOptim = useOptim,  :
  4528rows had non-positive estimates of variance for coefficients

If you are wondering why I set the fitType argument to 'local': while running DESeq() I first got the note below along with the warnings, so I adjusted it.

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

I then tried to augment the maxit argument but am still stuck with the same issue.

> CD4_dds <- estimateSizeFactors(CD4_dds)
> CD4_estDisp <- estimateDispersions(CD4_dds, fitType='local')
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
Warning message:
In fitNbinomGLMs(objectNZ[fitidx, , drop = FALSE], alpha_hat = alpha_hat[fitidx],  :
  4337rows had non-positive estimates of variance for coefficients
> CD4_DESeq <- nbinomWaldTest(CD4_estDisp, maxit=1000)
3 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
Warning message:
In fitNbinomGLMs(objectNZ, betaTol = betaTol, maxit = maxit, useOptim = useOptim,  :
  4515rows had non-positive estimates of variance for coefficients

So my question is what did go wrong and what would be the best approach to analyze the data? Thank you in advance for any reply, I really appreciate it.

sessionInfo( )

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] limma_3.60.3                EnhancedVolcano_1.22.0      pcaExplorer_2.30.0          sva_3.52.0                  BiocParallel_1.38.0        
 [6] genefilter_1.86.0           mgcv_1.9-1                  nlme_3.1-165                PCAtools_2.16.0             pheatmap_1.0.12            
[11] vsn_3.72.0                  ggrepel_0.9.5               lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1              
[16] dplyr_1.1.4                 purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                 tibble_3.2.1               
[21] ggplot2_3.5.1               tidyverse_2.0.0             DESeq2_1.44.0               SummarizedExperiment_1.34.0 Biobase_2.64.0             
[26] MatrixGenerics_1.16.0       matrixStats_1.3.0           GenomicRanges_1.56.1        GenomeInfoDb_1.40.1         IRanges_2.38.1             
[31] S4Vectors_0.42.1            BiocGenerics_0.50.0        

loaded via a namespace (and not attached):
  [1] splines_4.4.1             later_1.3.2               bitops_1.0-7              filelock_1.0.3            preprocessCore_1.66.0     graph_1.82.0             
  [7] XML_3.99-0.17             lifecycle_1.0.4           httr2_1.0.1               edgeR_4.2.0               topGO_2.56.0              doParallel_1.0.17        
 [13] lattice_0.22-5            crosstalk_1.2.1           dendextend_1.17.1         magrittr_2.0.3            plotly_4.10.4             rmarkdown_2.27           
 [19] shinyBS_0.61.1            httpuv_1.6.15             NMF_0.27                  cowplot_1.1.3             DBI_1.2.3                 RColorBrewer_1.1-3       
 [25] pkgload_1.4.0             abind_1.4-5               zlibbioc_1.50.0           RCurl_1.98-1.14           rappdirs_0.3.3            seriation_1.5.5          
 [31] GenomeInfoDbData_1.2.12   irlba_2.3.5.1             AnnotationForge_1.46.0    annotate_1.82.0           dqrng_0.4.1               DelayedMatrixStats_1.26.0
 [37] codetools_0.2-19          DelayedArray_0.30.1       DT_0.33                   xml2_1.3.6                tidyselect_1.2.1          GOstats_2.70.0           
 [43] UCSC.utils_1.0.0          ScaledMatrix_1.12.0       viridis_0.6.5             TSP_1.2-4                 BiocFileCache_2.12.0      base64enc_0.1-3          
 [49] webshot_0.5.5             jsonlite_1.8.8            survival_3.7-0            iterators_1.0.14          foreach_1.5.2             tools_4.4.1              
 [55] progress_1.2.3            Rcpp_1.0.12               glue_1.7.0                gridExtra_2.3             SparseArray_1.4.8         xfun_0.45                
 [61] ca_0.71.1                 shinydashboard_0.7.2      withr_3.0.0               Category_2.70.0           BiocManager_1.30.23       fastmap_1.2.0            
 [67] fansi_1.0.6               SparseM_1.84              digest_0.6.36             rsvd_1.0.5                timechange_0.3.0          R6_2.5.1                 
 [73] mime_0.12                 colorspace_2.1-0          GO.db_3.19.1              biomaRt_2.60.1            RSQLite_2.3.7             threejs_0.3.3            
 [79] utf8_1.2.4                generics_0.1.3            data.table_1.15.4         prettyunits_1.2.0         httr_1.4.7                htmlwidgets_1.6.4        
 [85] S4Arrays_1.4.1            pkgconfig_2.0.3           gtable_0.3.5              blob_1.2.4                registry_0.5-1            XVector_0.44.0           
 [91] htmltools_0.5.8.1         RBGL_1.80.0               GSEABase_1.66.0           scales_1.3.0              png_0.1-8                 knitr_1.48               
 [97] rstudioapi_0.16.0         tzdb_0.4.0                reshape2_1.4.4            curl_5.2.1                shinyAce_0.4.2            cachem_1.1.0             
[103] parallel_4.4.1            AnnotationDbi_1.66.0      pillar_1.9.0              grid_4.4.1                vctrs_0.6.5               promises_1.3.0           
[109] BiocSingular_1.20.0       dbplyr_2.5.0              beachmat_2.20.0           xtable_1.8-4              cluster_2.1.6             Rgraphviz_2.48.0         
[115] evaluate_0.24.0           cli_3.6.3                 locfit_1.5-9.10           compiler_4.4.1            rlang_1.1.4               crayon_1.5.3             
[121] rngtools_1.5.2            heatmaply_1.5.0           affy_1.82.0               plyr_1.8.9                stringi_1.8.4             viridisLite_0.4.2        
[127] gridBase_0.4-7            assertthat_0.2.1          munsell_0.5.1             Biostrings_2.72.1         lazyeval_0.2.2            Matrix_1.6-5             
[133] hms_1.1.3                 sparseMatrixStats_1.16.0  bit64_4.0.5               KEGGREST_1.44.1           statmod_1.5.0             shiny_1.8.1.1            
[139] igraph_2.0.3              memoise_2.0.1             affyio_1.74.0             bit_4.0.5

DESeq2 RNAseq • 1.4k views

ADD COMMENT • link written 9 months ago by mila ▴ 10

score 3 · Accepted Answer · 2024-07-18

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

Two options, you could increase the number of samples to have a non-zero value. Often these warnings are due to large coefficient vectors and lots of zeros in the count vector.

Alternatively you can use limma-voom which handles this through pseudocounts and precision weighting.

ADD COMMENT • link 9 months ago Michael Love 43k

1

Entering edit mode

Thank you for your reply Michael,

increasing the sample number just lowered the row number that give a warning but limma-voom worked fine with this dataset.

ADD REPLY • link 9 months ago mila ▴ 10