Hi
I want to perform a differential expression on RNA-Seq data. During the data exploration, the PCA plot indicated that a batch effect was present (plot not shown). I obtained additional information about the experiment and indeed, the samples were processed by two different persons. The metadata of the experiment is shown in this table:
sample treatment lab_tech
1 sample1 control tech1
2 sample2 control tech1
3 sample3 control tech1
4 sample4 treatA tech2
5 sample5 treatA tech2
6 sample6 treatA tech1
7 sample7 treatA tech1
My first idea was to perform a differential expression analysis between samples4/5 and samples6/7. The genes that are called differentially expressed are probably due to the batch effect. Therefore, I could use that list to "correct" the results of differential expression analysis of control vs treatA. But then I started wondering if the batch effect couldn't be modelled by including it into the design? However, I do not know how to formulate a correct design. I am not even sure if this is possible. Does anyone want to help?
Thanks in advance.
It's one of those rare times where the fix for a complication is simpler than you first think it will be.