Question

Convergence of betas, iterations arguments with nbinomialWaldTest

0

Entering edit mode

Mozart ▴ 30

@mozart-20625

Last seen 4.4 years ago

Note: this post has been firstly posted on Biostars, Moderators suggested to post it here being this more appropriate for my purpose.

Hi, I am not going to ask how to solve issue related to the maxit argument with nbinomialWladTest within DESeq2 but for me, far from this scientific field, it's really hard to go through the vignette, other posts and lessons and trying to end up with a clear explanation of what is the convergence of beta and why using a larger maxit argument with nbinomWaldTest may solve the issue.

I assume that first of all this may come up when the design of the experiment is not well balanced;
Referring to this link Dr Love explains that when there is a single count in a row of 0s (so I guess when a certain gene in a sample is ≥1 where the same gene has 0 counts in every other samples) the GLM code may have some problems converging the betas. Now, I am a bit rusty with my stats knowledge but if I got this right: convergence of betas is required to fit a line in our experiment so that is possible to estimate dispersions of our parameters of interest. right? effect of X on Y in Y=βX+ε

Is that correct?

Why using a larger maxit argument may solve this convergence problem? I presume that this step removes lowly expressed genes with low power so that makes the fit more reliable? Thanks

deseq2 • 933 views

ADD COMMENT • link updated 4.4 years ago by Michael Love 43k • written 4.4 years ago by Mozart ▴ 30

score 2 · Accepted Answer · 2020-10-06

Rows of the count matrix with very low counts (e.g. a single sample with a positive count per group) usually are the culprit when the method is not converging in beta, and hence I often recommend simply filtering these by specifying X number of samples have a count of Y or more, where you can choose those values in a conservative way to preserve any rows with possible signal.

Convergence of betas is required for dispersion estimation, and then again for reporting the LFC values and their SE, and for forming the test statistic.

maxit sometimes helps -- it literally allows the method more iterations (see ?nbinomWaldTest), beyond the default of 100. But also removing these very low count features is a faster solution.