Hey guys,
I have the following scenario. We performed MACE experiments (like RNA-seq but you produce only one read per transcript) for three developmental stages of tomato pollen (3 biological replicates for each stage). On the one hand I produced RPM values for all genes by normalizing simply for the differing library size. On the other I performed an analysis with DESeq2. When comparing the RPM values of stage 1 and stage 3 I see for some genes nearly identical RPM values, whereas according to DESeq2 these genes are differentially expressed. In total I have around 16,000 (of around 19,000) differentially expressed genes when performing DESeq2 analysis with LRT test on all three stages
When looking at the venn diagram for identified genes between the 3 stages I see the following:
only_stage1: 3891
only_stage2: 275
only_stage3: 178
stage1_stage2: 3846
stage1_stage3: 88
stage2_stage3: 415
all_3_stages: 10387
My problem is that I do not know whether I should trust the RPM values or DESeq2 because DESeq2 has for normalization the assumption that: "normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are". But during the development of pollen there are drastic physiological and morphological changes going along with strong changes in expressed genes (as you can see from the venn diagram).
Or is DESeq2 fine and I have strong carry-over effects within the RPM values by genes exclusively expressed in one of the stages.
I further checked one of our housekeeping genes from qRT-PCRs. This gene has for the RPM values a log2FC between stage3 and 1 of ~-2.3 and for DESeq2 a log2FC of -0.89 (here I made a Wald test between stage3 and 1 and it is DE). So for the housekeeping gene DESeq2 is performing slightly better.
I really hope someone has any suggestions for me. I'm really desperate.
Greetings
Mario
You said that "there are drastic physiological and morphological changes going along". I'm saying that DESeq2 minimizes the amount of DE through it's normalization procedure. So the best I can tell you is I wouldn't worry about global changes (without any kind of technical or housekeeping controls) making DESeq2 call too many genes DE. It's the opposite problem.
My suggestion, because you seem suspicious of certain results, is to look at plotCounts of DE genes to see if they make sense to you. It's simply counts normalized to control for sequencing depth, but you can see the variability, which is useful.
Furthermore, by looking at the MA plot, you can see the extent of DE throughout the whole experiment.
Thank you very much for your answers. What I still do not get is this part: Essentially, any truly global changes across stage is normalized away
But I see between stage 3 and stage 1 around 10,400 downregulated, 4,250 upregulated and 1,350 not DE genes. Aren't this global changes and not normalized away? And isn't this violating the assumption that most genes are not DE?
Thanks in advance for your answer
I mean global changes as in: the same direction, all up-regulation or the reverse.