When using DESeq2, I noticed that some of my top genes have a pvalue or padj of zero. I suppose the pvalue from the Wald test is really small and it got rounded at some point when I run DESeq2, although it is a bit surprising that other packages, including limma/voom, edgeR assigned a more reasonable pvalue (e.g E-15, E-20, etc) to the same genes using the same dataset. In contrast, DESeq2 is only giving zeros for those same genes. The second lowest p-value I get (after the zeros) is E-82 for a similar logFC for the genes with p-value of zero. Given this situation, how I can use these values to generate a decent volcano plot to represent the data?. When my top genes have a value of zero, the scale of the volcano plot is just awful because the top genes are assigned -log10P values that are through the roof . Also, I would appreciate if someone could help me understand why DESeq2 doesn't provide a more reasonable p-value (I mean E-82 is a bit extreme... ). I am attaching some values from by data.Thanks!
I'll just add a comment on the volcano part because you have tagged my package, EnhancedVolcano. EnhancedVolcano will automatically convert p-values of 0 to the lowest possible machine value, which I imagine differs based on whatever OS you are using. It issues a warning message to console when doing this. I introduced this behaviour for the purposes of being able to plot these genes, which otherwise would not be plotted (negative log10 of 0 is infinity).
Just looking at your plot, there nevertheless looks to be something wrong with your p-value distribution, even when not considering the bunch of genes at the top-left. I'll let Michael answer on the DESeq2 part, though.
Hi Kevin, thank you for your response. Yes I noticed this behavior, the problem is that volcano plot is zoomed out a lot to the point that it doesn't even look like a volcano anymore (see attached jpeg). is there a way to set the coordinate limits without dropping data observations (similar to coord_cartesian() in ggplot2)?
The package is built on the ggplot2 engine and returns a ggplot2 object, so, you could likely just do, for example:
Behaviour may be unexpected though.
Thanks Kevin. I tried
And it works at setting the axis limits, but the data points outside of the visible plot are hidden. Is there a way to represent those values falling outside the visible plot with perhaps arrows, similar to the MA plots from DESeq2?
That functionality is not available, unfortunately. Regarding the fundamental issue: I don't know what a p-value of 0 actually means. You could likely identify these transcripts prior to running EnhancedVolcano and set them to, for example, 10E-1 * the lowest non-zero p-value. Then, the plot visualisation may improve.