Question

Differences in sample size between conditions - pairwise comparison

0

Entering edit mode

Aiswarya • 0

@b4cc611d

Last seen 15 months ago

United States

I am performing a simple pairwise comparison between two different conditions (control vs infection) but I have a substantial difference in sample size between each condition (control,n=300, infection,n= 584) I was wondering how this would affect my analysis, and how should I account for this without subsetting the larger sample.

DESeq2 • 715 views

ADD COMMENT • link updated 24 months ago by Michael Love 43k • written 24 months ago by Aiswarya • 0

score 0 · Answer 1 · 2023-02-14

0

Entering edit mode

ATpoint ★ 4.6k

@atpoint-13662

Last seen 15 hours ago

Germany

Differences in n per group are not a problem, especially at this quite large overall sample size. If speed of execution becomes a problem you might consider linear approaches such as limma-voom which is much faster. Be sure to explore your data for confounders. See for example from the DESeq2 developer this analysis on a large cohort https://github.com/mikelove/preNivolumabOnNivolumab/blob/main/preNivolumabOnNivolumab.knit.md

ADD COMMENT • link 24 months ago ATpoint ★ 4.6k

0

Entering edit mode

Thank you, that helped a lot.

Extending the same question, I have infection samples from different time points for the same patient ( the number of time points for which the data has been collected post infection varies between patients) in this case should I stick to a single timepoint for all the infection patients? how will it affect my analysis if I have post infection data from same patient taken under the general umbrella as "infection" (at this point I want to stick to pairwise comparison and not time series)

Thanks in advance

ADD REPLY • link 24 months ago Aiswarya • 0

0

Entering edit mode

You should fit all the samples in one go, and then you can make pairwise comparisons with results().

~donor + time

ADD REPLY • link 24 months ago Michael Love 43k

0

Entering edit mode

Right, for me these types of designs end up being less than a minute to run with DESeq2 which is shorter than the time it takes to examine the plots afterward, so not prohibitive. Eg the Nivolumab dataset takes something like 25 seconds.

ADD REPLY • link 24 months ago Michael Love 43k