Hi there, I understand that DESeq2 uses GLM to regress over the raw gene count matrix. And I just wonder for the inner part of the linear regression, is it possible to retrieve some more detailed information after 'DESeq' function is done?
For example, for a regression of y = a1X1 + a2X2, I know I can retrieve the fitted coefficients by calling the function 'coef' in DESeq2 package, which corresponds to a1 and a2. But I don't know
a. where to get the model design matrix, I mean what is the phenotype value matrix for each sample? Since it is quite intuitive to get this value matrix for a continuous variable X1, but if X2 is a categorical variable with 'Case' label in sample 1,2,3, and 'Ctrl' label in sample 4,5,6, I'm wondering how deseq transform this categorical vector into numeric values. Is there a function for calling this information out?
b. Is there any function to retrieve the 'y' value matrix for each gene * each sample? I know vsd or log2(normalized counts) is quite similar to what I want, but I guess there should be a gene count matrix that is used for the linear regression fit - the y on the left side of the above equation (after outside regression of negative binomial distribution and normalization and log transformation) and I'm wondering how to retrieve this matrix? Thanks a lot for your kind reply!
Hi Michael, thanks for the answer! I'll have a look : )