DESeq2 with unbalanced experimental design
1
0
Entering edit mode
celine • 0
@celine-7449
Last seen 6.0 years ago
European Union

Dear all,

I am used to analyse RNA-seq data with the very useful and well-documented DESeq2 package. I have analysed an RNA-seq dataset containing 2 conditions (control and transgenic mice) with 3 replicates for the control condition and only 2 replicates for the transgenic one (we initially sequenced 3 transgenic samples but the quality of one of the sample was not sufficient and we therefore have to exclude this sample from the analysis). We submitted a manuscript containing these analyses but one of the reviewer wrote that “the RNA-seq performed is a 2 against 3 experiment and therefore the statistical analysis applied is not valid“.

As in the DESeq2 Genome Biology article: « experimental design with as little as two or three replicates are common and reasonable » I think this is valid to use DESeq2 with this number of replicates. Moreover as the pasilla dataset used in the DESeq2 vignette contains different number of replicates for each condition I also assume that this is valid to use DESeq2 on an unbalanced experimental design.

I am aware that the power of the analysis would have been better with more replicates per condition and a balanced experimental design, but I just want to have a confirmation that applying DESeq2 on such an experimental design is valid.

Thank you in advance for your answer.

Best regards,


Céline

deseq2 • 6.5k views
ADD COMMENT
0
Entering edit mode

Thank you very much for your quick and precise answer.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

That's strange. Yes, the whole point of information sharing across genes in DESeq, edgeR, limma and others, is to allow for statistical inference when sample sizes are small. And yes, DESeq2 methods (the generalized linear model) are valid when the groups are not balanced. You can just reply with a link to one of our figures showing the sensitivity for a 3 vs 3 comparison. 2 vs 3 will have slightly reduced sensitivity, but I'm not sure what other statistical analysis this person has in mind which would have higher sensitivity than methods which share information about variance estimates across genes.

ADD COMMENT
0
Entering edit mode

hi,mike Now,I met the same question,the reviewer wrote that “the RNA-seq performed is a 2 against 3 experiment and therefore the statistical analysis applied is not valid“. I find that your reply the url of the figures is lost,you can give me again? Thank you very much.

ADD REPLY
0
Entering edit mode

Fixed the link

ADD REPLY
0
Entering edit mode

Hey Mike, thanks for the response. We also had a similar issue with 4 vs 8 samples across two conditions. This is single-cell data and I am pseudo-bulking the samples based on the recommendations of https://www.nature.com/articles/s41467-021-25960-2. What would you suggest here? I can think of 3 possible ways to do this: 1) Do pseudobulking and use DESeq2 comparing 4 vs 8 samples. 2) Do a single-cell DEseq2 comparison using batch as a covariate? and 3) Do a rank-sum test across cells with bootstrapping to estimate the error as you have done previously in one of your publications? Any feedback would be appreciated.

ADD REPLY
0
Entering edit mode

This is a 10 year old thread, would you mind creating a full new post with details about your setup?

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6