Hi! I have used DESeq2 to analyze my RNAseq data. I am studying and comparing two types of tissues.
I have made some boxplots showing different gene markers and the gene counts, comparing the two tissues. When looking at the significantly affected genes in the plot, often the boxes is not that far apart as you would expect. Thereafter, when looking into the counts data, I got a bit surprised.
Examples:
Gene1:
Counts tissue 1: 342, 263, 486
Counts tissue 2: 347, 323, 543
p-adj: 0.006
Gene2:
Counts tissue 1: 19005, 14575, 24455
Counts tissue 2: 8840, 25618, 24298
p-adj: 0.019
Gene3:
Counts tissue 1: 10100, 8286, 14597
Counts tissue 2: 7334, 6179, 14408
p-adj: 4.2e-05
How does these count values and their p-values make sense?
I do not understand why they are significant when the mean values are so close.
This does not look like normalized counts but raw counts. Consider using
counts(dds, normalized = TRUE)
or the output ofnormTransform()
orvst/rlog()
. Generally, MA-plots and volcanos are a good start for diagnosis how the DE profile is across the entire dataset, this in combination with plotPCA() is routine diagnosis.