Hi everyone,
I've just realized that the data I got to do a plotCounts for a plot of a specific gene and the values obtained by counts, with normalized=T. I was wondering if there is a reason for this? Is anyway related with the pseudo-counts (pc) arguments?
Thanks!
As far as I understand then, the values I get from plotCounts are different since the processing of the data by DESeq2 is more complicated than "just" normalization by sequencing depth. Is there any way to access the values for the values used for DESeq2 calculations?
The main objective of this is to be able to calculate baseMeans and baseVariances per groups of interests.
DESeq2 calculations are a generalized linear model on raw counts, with size factor offsets, where the design determines the coefficients in the GLM.
Can you say what values you are after?
If you want to calculate mean and variance of each group, I'd use counts(dds, normalized=TRUE). These may be useful descriptive statistics. Note though that these sample means and variances per group aren't used by DESeq2 in its estimation of its test statistics.
Let's imagine I have gene A. When I perform plotCounts, the output consistis of a data.frame:
However, for gene A, when I go to counts, normalized=T, the values are different. I was just wondering why there are differences (which you already answered - thanks for that!) and if I could assess in a systematic way to a data.frame with the values per sample as in plotCounts but, for the whole gene set.
My question came up in the logFC shrinkage; I would like to use some metrics to further explore some big differences I obtained (the method for shrinkage I understand; I wanted some metrics to make it "easier" to assess the differences).
Thanks!
How are they different? Can you show the other values?
Here they are:
If you look up the help for plotCounts you will see that a pseudocount of 0.5 is added to the data by default (because the default setting for transform=TRUE, and counts of 0 cannot be plotted when the y-axis has log scale).
You can access the normalized counts with counts(dds, normalized=TRUE), and what you are getting from plotCounts(dds, returnData=TRUE) has 0.5 added because transform=TRUE. If you set plotCounts(dds, transform=FALSE, returnData=TRUE) you would get the same values as the normalized counts via counts().
The difference is consistently 0.5, which corresponds to the default value for pseudo-counts in plotCounts, that was my first assumption.
Dear Michael,
Sorry for re-opening this post again. I have a question regarding the input in plotCounts:
In plotCounts(dds), when we apply "normalized=TRUE", this corrects for size factor. But when we apply VST on the dds object, we correct both for size factor and library size, right? I use the second to plot PCAs, for example. Then library size is not corrected in plotCounts?
Besides this, and regarding batch effect, normally I correct batch effect for plotting purposes like this:
How can I use the batch-corrected data in my plotCount?
Thank you. Laia
VST corrects for library size (as modeled by size factor).
plotCounts is designed to show the count data including batch.
I recommend to use VST data if you want to eg see variation that remains after regressing out batch.