I have been tasked with looking for differential expression in a colleague's data set. The data set consists of 24 RNAseq samples, made up of 4 groups of 6 samples.
I used DESeq2 with everything set default, and I am a little concerned with my results.
We have very few genes differentially expressed (2-4 per comparison), and those that seem to be from comparisons that are 0s compared to maybe 2 samples with counts when comparing between a treatment (named BOTH) group of 6 and group of 6 control (named CLEAN).
Is it valid to use these genes where maybe 2 samples are driving DE between the groups?
Are the differences in the gene expression biologically plausible? I think it makes sense if a gene is turned on/off depending on the environment/condition.
I am not familiar with DESeq2, but with edgeR and limma you can use their robust setting to minimise the effect from outliers to DGE analysis.
Have you tried doing PCA plots for your samples?
Just a note about terminology: I'd argue those counts are not outliers. If you have 3/6 samples with a high count, which is the outlier, the 0's or the high counts?