Question

DESeq2: DE Analysis with very imbalanced samples per condition

0

Entering edit mode

thanos5541 • 0

@thanos5541-22407

Last seen 5.4 years ago

Hello everyone,

My group has been conducting a large scale analysis using TCGA data. I'm using the expression results to identify DE genes following the DESeq2 vignette along with lfcShrink (apeglm). I apply the analysis between healthy and diseased samples for multiple organs.

However the healthy samples for almost every organ are about 1-15% of the diseased samples (eg. 44 healthy vs 525 diseased,130 vs 903 or even 3 vs 309!). I do get results for almost every organ studied, but I am skeptical on the actual statistical significance of said results and the amount of bias introduced by such a big difference in the sample numbers representing each condition.

Should I do something differently in the analysis because of such imbalance in the samples per condition or is such an analysis pointless because of this? Are the results with adjusted p-value < 0.1 still considered significant as indicated by DESeq2? Should I decrease the required adjusted p-value to less then 0.05 or find a formula for the significance cutoff?

I have searched for similar cases online, but I could not find any so extremely imbalanced as ours, which is why I am asking this here. I have read that DESeq2 does not need equal samples per condition to provide significant results, but I am not sure if that covers extreme cases like ours.

Thanks in advance

deseq2 cancer • 1.7k views

ADD COMMENT • link updated 15 months ago by Shaimaa Gamal • 0 • written 5.4 years ago by thanos5541 • 0

score 2 · Accepted Answer · 2019-11-21

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

There is nothing to change with a large imbalance in the DESeq2 code.

I will mention that you can also easily rely on nonparametric tests such as Wilcoxon and permutation for FDR computation.

ADD COMMENT • link 5.4 years ago Michael Love 43k

0

Entering edit mode

I see, thank you very much for your quick response!

ADD REPLY • link 5.4 years ago thanos5541 • 0

0

Entering edit mode

I have the same problem, but I am using limma voom. I have 82 cancer samples and 390 control samples. Any suggestions?

ADD REPLY • link 15 months ago Shaimaa Gamal • 0