Hi, I have multiple conditions (A-F) with triplicates (1-3) to cross compare but they can not be sequenced together due to the sample number limitation of the sequencing machine. So they have to divide into 2 batches for sequencing. Following sequencing, I used Tophat2-featureCounts-DEseq2 pipeline to analyze them. My question is: should I merge the FeatureCounts result into one file as the input for DEseq2? Is it OK that I run the pipeline for each batch of samples, generate the DE results (Condition vs control) and then compare the final results?
For example, the first batch is
A.1 | Control |
A.2 | Control |
A.3 | Control |
B.1 | ConditionB |
B.2 | ConditionB |
B.3 | ConditionB |
C.1 | ConditionC |
C.2 | ConditionC |
C.3 | ConditionC |
D.3 | ConditionD |
D.3 | ConditionD |
D.3 | ConditionD |
The second batch is
E.1 | ConditionE |
E.2 | ConditionE |
E.3 | ConditionE |
F.1 | ConditionF |
F.2 | ConditionF |
F.3 | ConditionF |
Can I run Tophat2-featureCounts-DEseq2 pipeline on these batches separately and generate DE results (Condition vs Control) and then compare these DE results among different conditions? Or should I let FeatureCounts summarize reads from all two batches into one CountData to feed into DEseq2 for DE analysis?
PS. all conditions share the same controls which are A.1; A.2; A.3;
Hi, James,
Thanks for your advice. Yes, we realized that batching effect can introduce bias into our samples after we completed all the sequencing. We are doing as exactly what you suggested now for other experiments.
However, with these old data, we want to take a look at any meaningful indications. We will follow up to validate via biological assays.