I am performing a differential expression analysis between treated and untreated cell lines using DESeq2. The samples were sequenced in two different batches, however, there is a perfect confounding of the conditions and the batches. All controlled samples are in one batch and all treated samples are in the second one. The perfect confounding does not allow to remove batches or even account for them in the formula. I was wondering is it still valid to compare all treated conditions to the control and then compare the regulation of different genes in response to treatment? For example, I would like to check if AKT1 is upregulated in one treatment when compared to control and downregulated in another one when compared to control.
Thank you!
There is not much —if anything— that can be done that is statistically sound. You could make an assumption that there is no difference between batch 1 and batch 2, but this is most likely an erroneous assumption. If you must proceed with the current data, then regard it as preliminary / pilot data and process each batch independently. Then, transform the normalised counts via
vst()
orrlog()
, followed by a final transformation to Z scores. At least on the Z scale, qualitative comparisons across batch are possible, if even to say things such as 'GeneX is expressed in condition 1, but not condition 2'. Absolute Z > 1.96 is equivalent to p=0.05 on a two-tailed distribution. This is still not ideal, though.