I am performing a simple pairwise comparison between two different conditions (control vs infection) but I have a substantial difference in sample size between each condition (control,n=300, infection,n= 584) I was wondering how this would affect my analysis, and how should I account for this without subsetting the larger sample.
Thank you, that helped a lot.
Extending the same question, I have infection samples from different time points for the same patient ( the number of time points for which the data has been collected post infection varies between patients) in this case should I stick to a single timepoint for all the infection patients? how will it affect my analysis if I have post infection data from same patient taken under the general umbrella as "infection" (at this point I want to stick to pairwise comparison and not time series)
Thanks in advance
You should fit all the samples in one go, and then you can make pairwise comparisons with results().
~donor + time
Right, for me these types of designs end up being less than a minute to run with DESeq2 which is shorter than the time it takes to examine the plots afterward, so not prohibitive. Eg the Nivolumab dataset takes something like 25 seconds.