Question

DEseq2 analysis for 2 batches of samples that are sequenced separately

0

Entering edit mode

EJ ▴ 20

@ej-11019

Last seen 2.8 years ago

USA, Boston, Harvard Medical School

Hi, I have multiple conditions (A-F) with triplicates (1-3) to cross compare but they can not be sequenced together due to the sample number limitation of the sequencing machine. So they have to divide into 2 batches for sequencing. Following sequencing, I used Tophat2-featureCounts-DEseq2 pipeline to analyze them. My question is: should I merge the FeatureCounts result into one file as the input for DEseq2? Is it OK that I run the pipeline for each batch of samples, generate the DE results (Condition vs control) and then compare the final results?

For example, the first batch is

A.1	Control
A.2	Control
A.3	Control
B.1	ConditionB
B.2	ConditionB
B.3	ConditionB
C.1	ConditionC
C.2	ConditionC
C.3	ConditionC
D.3	ConditionD
D.3	ConditionD
D.3	ConditionD

The second batch is


E.1	ConditionE
E.2	ConditionE
E.3	ConditionE
F.1	ConditionF
F.2	ConditionF
F.3	ConditionF

Can I run Tophat2-featureCounts-DEseq2 pipeline on these batches separately and generate DE results (Condition vs Control) and then compare these DE results among different conditions? Or should I let FeatureCounts summarize reads from all two batches into one CountData to feed into DEseq2 for DE analysis?

PS. all conditions share the same controls which are A.1; A.2; A.3;

deseq2 featurecounts rnaseq • 1.3k views

ADD COMMENT • link updated 8.7 years ago by James W. MacDonald 68k • written 8.7 years ago by EJ ▴ 20

score 1 · Answer 1 · 2016-07-28

Unfortunately, by running the samples that way you have completely aliased batch with any differences between conditions F and E and the others. So if you need to compare say ConditionF and ConditionD, there is no way to say if any apparent differences are really biological or if they are simply due to technical differences between runs.

Hypothetically these technical differences will be much smaller than the biological differences, but that is dependent on the underlying biology. The only hope here is that the technical differences are not predominating (not that you can say for sure if that is the case or not), and just combine and cross your fingers.

An alternative way to run these samples would have been to barcode, then mix all of them together and run on as many lanes as required to get the targeted depth. If you don't get the depth, you can simply re-run on as many lanes as needed to 'bump up' to the depth you want. That way you have randomized all samples into each technical replicate, and the technical differences between lanes or runs will no longer matter.