Hi everyone,
I'm working on a differential gene expression (DGE) analysis using DESeq2, with the goal of identifying genes that are differentially expressed according to age (which is a numerical value in info
).
Here is the code I have so far:
dds <- DESeqDataSetFromMatrix(countData = data, colData = info, design = ~ Age)
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds, alpha = 0.05)
1) Is it necessary to perform log fold change shrinkage (lfcShrink
)?
res.ape <- lfcShrink(dds, coef = "Age", type = "apeglm", res = res)
2) I found an interesting gene with the following statistics:
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Gene_X 6273.312 0.260054 0.0441626 5.88856 3.89567e-09 0.000079569
How can I plot the expression of this gene for all samples and draw a linear regression line using the intercept and slope computed by DESeq2?
Should I use the counts from:
vsd <- vst(dds, blind=TRUE)
assay(vsd)
?
Thank you!
Thank you for your quick support. I created two figures using the following code:
The counts from a particular gene were taken from
assay(normTransform(dds))
1) Using
coef(dds)
2) Using
lm
How can these be very different, and which one makes more sense?
Thanks
I directly answered that above in my post, did you see?
I think I read it too quickly. Thanks :)
Is there a way to get the R-squared R^2 from the model ?
You could compute R2 from the predicted values, using the fitted coefficients, correlated to the observed values, but as with the regression line, a GLM/likelihood approach takes into account count variance while simple regression on log counts does not.
I read in a Biostars thread that the fitted coefficient is in
assays(dds)[["mu"]]
, but I'm not sure if this is correct. Would this be the correct way to get the R2 ?The fitted counts in mu will work but they contain size factor scaling. So you can divide those by
sizeFactors(dds)
.Hi Michael, using an other dataset, I got this strange message:
In order to solve it, I did:
Then classical following steps.
Is it the right way to do ?
That's not strange to me :)
Scaling the numerical variables is useful, and the other message is just making sure users know how the variables will be modeled, so it isn't ambiguous.