I got a differentially expressed gene, with log(mean CPM) = 2.2447; logFC = 11.2344; p-adjusted = 0.0016;
This looks neat. But the problem araises when I take the tpm (transcript per million) values of these samples in 2 groups and draw boxplots.
Attached is a boxplot.
It turns out that the medians of both groups are ZERO, and visually, these two groups should not be called different at all!
Here are the two arrays that I used for boxplot:
[0,0,0.0363,0,0,0,0,15.1621,0,0,0,0.091,13.1992,0,0.064,0,0,27.9052,15.4516,0,0,0,22.6814,0,0.0124,5.3274]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Here is the boxplot picture https://drive.google.com/open?id=0B0AM3r3EIYRUVl8zNFphWWJCbEk (somehow can't attached to this form)
Has anyone already encountered this problem? And I would like to know how to justify this case (statistical package edgeR calls it differential expressed, but it's clearly not -- visually).
Thanks.
SnowRu
Well, I'm not sure what you want edgeR to say. Clearly, this gene is differentially expressed between your groups. Perhaps not in every sample, but the mean expression is definitely different, so what more do you want? Similar scenarios arise in analyses of single-cell RNA-seq data where a gene may not be expressed in every cell of a population, but the average population-level expression is still different between two groups. I've never found this hard to explain.
P.S. Reply to answers using the "add comment" or "add reply" buttons, not the "add answer" button.
This isn't a problem specific to RNA-seq. Any measurement at or near the detection limit of any assay is going to have a boxplot that looks like this.