Is it possible to run DESeq2 with gene-dependent binary variable such as membership in a pathway or transcription factor association? If not, in principle is it possible to modify functions fitNbinomGLMs, fitBeta, etc. so that it becomes possible in a meaningful way, or I miss some issue that doesn't allow this?
To me at least, it's unclear what could be achieved by this. Membership of pathway is usually dealt with by something akin to GSEA after generating a statistic via DESeq2. The only other thing I can of is subsetting your data by your binary variable, but there would be very little difference (if any) between doing this subsetting before or after the running of DESeq, so it seems a little pointless. If you're talking about a gene-and-sample dependent variable, then again we'd need to see what you're trying to achieve by introducing such a variable. Otherwise your gene-dependent variable is of necessity constant across samples, and therefor has no part to play in DESeq2's approach.
Thank you for the opinion. The idea is to compare groups of genes with different value of the introduced variable. I agree that making a variable also sample-dependent (which in principle makes sense e.g. for TF binding) will be more interesting. In addition, some kind of regularization over the coefficients of the new gene-dependent variable may make sense, but then it's going to be even more distant from what DESeq2 is now. I'll try to think more about it.
And regarding your idea that subsetting the data by the binary variable and then running DESeq2 has little difference with running DESeq2 first and then subsetting: do I understand correctly that for fitting GLM for each gene it is probably indeed little difference, but for estimating the dispersion it's really important what set of genes goes into the estimation?