Comparison of DESeq2 and BNB-R model
1
0
Entering edit mode
Homer • 0
@homer-18328
Last seen 4.8 years ago

Hi,

I am trying to understand and compare the DESeq2 model and the BNB-R (https://github.com/siamakz/BNBR) model. The corresponding references are:

  • DESeq2: Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8
  • BNB-R: Dadaneh, S. Z., Zhou, M., & Qian, X. (2018). Bayesian negative binomial regression for differential expression with confounding factors. Bioinformatics, 34(19), 3349–3356. https://doi.org/10.1093/bioinformatics/bty330

My understanding of the BNB-R model is that it regards the sample-specific size factor r_j of the negative binomial distribution as a parameter that has to be estimated through Bayesian inference (i.e. sampling from its posterior). In DESeq2, there is a pre-estimated sample-specific size factor s_j included in the mean, but there is also the dispersion parameter alpha_i. Therefore, am I right that DESeq2 imposes additional overdispersion (having the pre-estimated size factors s_j as well as alpha_i)?

DESeq2 BNB-R • 714 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

I don't follow what you mean by additional overdispersion. The library size factor is fixed across genes but unknown. We estimate it, and then treat it as fixed. I do think that it's a good idea to build in more conservative behavior by acknowledging that the size factor is not known. When I give talks about effect sizes, I mention that one way to do this post-hoc is to use a lfcThreshold, which avoids reporting genes which have effect sizes close to 0. If we estimate the size factors wrong, the genes with effect size close to 0 will be the first to be wrong, while the ones farthest from 0 are the safest.

ADD COMMENT
0
Entering edit mode

Thanks for your quick reply. And sorry, maybe I have to clarify my thoughts a bit. In DESeq2, the variance of the negative binomial distribution of a count K_ij (with i indexing the gene and j the sample) is Var(K_ij) = mu_ij + alpha_i * mu_ij^2 = s_j * exp(x_j^T * beta_i) + alpha_i * (s_j * exp(x_j^T * beta_i))^2. And as far as I understand the BNB-R model, we there have just Var(K_ij) = r_j * exp(x_j^T * beta_i) + 1/r_j * (r_j * exp(x_j^T * beta_i))^2. Am I correct? So if r_j from the BNB-R model corresponds to s_j in the DESeq2 model, shouldn't then be alpha_i = 1/r_j? You're right, "additional overdispersion" is probably not the correct term. Perhaps I should have said "alternative dispersion parameterization".

ADD REPLY
0
Entering edit mode

I haven’t read that paper yet, so I don’t know how the models map to each other.

ADD REPLY

Login before adding your answer.

Traffic: 866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6