Differential expression in samples where major gene is downregulated
1
0
Entering edit mode
dmr210 ▴ 30
@dmr210-12497
Last seen 7.2 years ago

Hi,

I have samples where one gene accounts for more than 40% of the total number of reads in normal conditions. In one phenotype that I consider, that gene is up-regulated.

How will that impact the differential expression of the other genes?

How does DESeq2 do the normalisation to avoid considering these other genes artificially down-regulated because of that?

Let's look at 'fake' numbers of RNA molecules:

Phenotype 1

G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
1000 50 60 12 150 180 140 10 190 45

Total number of molecules: ~ 2000

Phenotype 2

G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
1500 50 60 12 150 180 140 10 190 45

Total number of molecules: ~2500

The sequencing depth might be the same between the two, so normalising by sequencing depth is not going to help correct for that. Also, DESeq2 assumes a log normal distribution for the gene expression levels, but I was wondering if such a high read count for one single gene might make that assumption wrong?

I am unsure if this is simply equivalent to half of the genes being up-regulated in the sample, with no genes down-regulated, which DESeq2 is clearly equipped to tackle, or if it is different?

Could you explain how DESeq2 accounts for cases such as this one?

Thanks very much,

Delphine

EDIT: I attach an MAplot, and changed up to down and down to up as my plot was the other ay around compared to what I had written (the gene I am talking about is up-regulated in this plot, because of the condition considered as baseline)

MAplot

deseq2 • 815 views
ADD COMMENT
0
Entering edit mode

Can you post an image (you can use imgur.com for hosting) of the MA plot if you use DESeq2? You can get a quick sense of how the normalization works. Or you can even plug in some simulated counts like you have above to see how it works.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 days ago
United States

hi,

So the DESeq2 normalization (and similarly with edgeR's normalization method) is not thrown off by a minority of genes with differential expression, because it uses the median of ratios across all genes. Even though a single gene accounts for 40% of the reads, it has little leverage on the size factor calculation because it is just one gene out of thousands, and the median across genes is used.

ADD COMMENT

Login before adding your answer.

Traffic: 547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6