Hello
I have an issue concerning my p values and you could probably help me understand. I am doing a gene expression analysis using deseq2. In total I have 5534 genes and out of them around 230 genes showed NA for both adjusted and non-adjusted. Maybe this is not that important since is a rather low proportion of the whole gene set but I would like to understand why. I read about the reasons why NA can be generated but when I check the data set those gene counts seem to be quite ok and not very extreme or different from the others.
For example this gene below is one of those that gives NA for both kinds of p-values. the 3 first numbers are the replicates from the first treatment and the second 3 numbers are the replicates from the second treatment:
PP_2663: 4106 30886 4353 1297 6438 7720 these are the not normalized counts
PP_2663: 3701.2 115446.2 3025.1 1942.6 3665 3689.9 these are the deseq-normalized counts
This is how a normal gene (no NA p values) looks like:
PP_4980: 8896 5882 9057 5371 11917 13615 not normalized
PP_4980: 8019 21985.8 6294.2 8044.6 6784.2 6507.5 normalized
This weirdo also does give normal p-values (no NA p values) for the adjusted and not-adjusted p-values:
PP_5640: 0 0 1 0 3 2 not normalized
PP_5640: 0 0 0.6 0 1.7 0.9 normalized
Soo what is going on here? am I doing something wrong? the pipeline and commands are quite straightforward. I just provide my count files matrix and DESeq it.
As I said maybe is not that important but it feels that these analysis are not correct. I do not think that filtering the low count genes would affect the results much as only three genes have a row sum lower than 10. The other genes have much higher counts (at least 300).
I wanna get to the bottom of this because I am failing to find differences in gene expression even between conditions that should give differences. The replicate number is low, I know, and there is variation between the replicates of the treatments which I suspect come from the library preparation (the proportion of coding-RNA vs non-coding like RNA is highly variable between replicates). Maybe that is not connected at all with my question above but I am just trying to connect the dots and give all the info of the peculiarities of this data set. It might help.
Thank you very much in advance!!
Regards