Question

DESeq2 for DE analysis in developmental stages of tomato pollen

0

Entering edit mode

mario.keller.1988 ▴ 20

@mariokeller1988-13652

Last seen 2.0 years ago

Germany

Hey guys,

I have the following scenario. We performed MACE experiments (like RNA-seq but you produce only one read per transcript) for three developmental stages of tomato pollen (3 biological replicates for each stage). On the one hand I produced RPM values for all genes by normalizing simply for the differing library size. On the other I performed an analysis with DESeq2. When comparing the RPM values of stage 1 and stage 3 I see for some genes nearly identical RPM values, whereas according to DESeq2 these genes are differentially expressed. In total I have around 16,000 (of around 19,000) differentially expressed genes when performing DESeq2 analysis with LRT test on all three stages

When looking at the venn diagram for identified genes between the 3 stages I see the following:

only_stage1: 3891
only_stage2: 275
only_stage3: 178
stage1_stage2: 3846
stage1_stage3: 88
stage2_stage3: 415
all_3_stages: 10387

My problem is that I do not know whether I should trust the RPM values or DESeq2 because DESeq2 has for normalization the assumption that: "normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are". But during the development of pollen there are drastic physiological and morphological changes going along with strong changes in expressed genes (as you can see from the venn diagram).

Or is DESeq2 fine and I have strong carry-over effects within the RPM values by genes exclusively expressed in one of the stages.

I further checked one of our housekeeping genes from qRT-PCRs. This gene has for the RPM values a log2FC between stage3 and 1 of ~-2.3 and for DESeq2 a log2FC of -0.89 (here I made a Wald test between stage3 and 1 and it is DE). So for the housekeeping gene DESeq2 is performing slightly better.

I really hope someone has any suggestions for me. I'm really desperate.

Greetings
Mario

DESeq2 DE normalization • 2.0k views

ADD COMMENT • link 7.7 years ago mario.keller.1988 ▴ 20

0

Entering edit mode

I'm not sure if I understand you correctly. Should I trust the DESeq2 results or not? The housekeeping gene I checked was never tested along the development of pollen so I am not sure if I can draw any conclusions from it.

ADD REPLY • link 7.7 years ago mario.keller.1988 ▴ 20

0

Entering edit mode

You said that "there are drastic physiological and morphological changes going along". I'm saying that DESeq2 minimizes the amount of DE through it's normalization procedure. So the best I can tell you is I wouldn't worry about global changes (without any kind of technical or housekeeping controls) making DESeq2 call too many genes DE. It's the opposite problem.

My suggestion, because you seem suspicious of certain results, is to look at plotCounts of DE genes to see if they make sense to you. It's simply counts normalized to control for sequencing depth, but you can see the variability, which is useful.

Furthermore, by looking at the MA plot, you can see the extent of DE throughout the whole experiment.

ADD REPLY • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Thank you very much for your answers. What I still do not get is this part: Essentially, any truly global changes across stage is normalized away

But I see between stage 3 and stage 1 around 10,400 downregulated, 4,250 upregulated and 1,350 not DE genes. Aren't this global changes and not normalized away? And isn't this violating the assumption that most genes are not DE?

Thanks in advance for your answer

ADD REPLY • link 7.7 years ago mario.keller.1988 ▴ 20

0

Entering edit mode

I mean global changes as in: the same direction, all up-regulation or the reverse.

ADD REPLY • link 7.7 years ago Michael Love 43k

score 0 · Answer 1 · 2017-08-03

DESeq2's normalization will minimize the amount of differential expression. Essentially, any truly global changes across stage is normalized away (unless you have an idea about genes which do not change -- controlGenes -- and then you can input this information).

You may find it useful to look at the plotCounts for some of the DE genes. And the MA plots. If you are not interested in genes which are changing only slightly, you can use the lfcThreshold argument of results() to specify that the genes should change by a certain amount (on the log2 fold change scale).