DEseq2 analysis for 2 batches of samples that are sequenced separately
1
0
Entering edit mode
EJ ▴ 20
@ej-11019
Last seen 2.6 years ago
USA, Boston, Harvard Medical School

Hi, I have multiple conditions (A-F) with triplicates (1-3) to cross compare but they can not be sequenced together due to the sample number limitation of the sequencing machine. So they have to divide into 2 batches for sequencing. Following sequencing, I used Tophat2-featureCounts-DEseq2 pipeline to analyze them.  My question is: should I merge the FeatureCounts result into one file as the input for DEseq2? Is it OK that I run the pipeline for each batch of samples, generate the DE results (Condition vs control) and then compare the final results? 

For example, the first batch is

A.1 Control
A.2 Control
A.3 Control
B.1 ConditionB
B.2 ConditionB
B.3 ConditionB
C.1 ConditionC
C.2 ConditionC
C.3 ConditionC
D.3 ConditionD
D.3 ConditionD
D.3 ConditionD

The second batch is

   
E.1 ConditionE
E.2 ConditionE
E.3 ConditionE
F.1 ConditionF
F.2 ConditionF
F.3 ConditionF

 

Can I run Tophat2-featureCounts-DEseq2 pipeline on these batches separately and generate DE results (Condition vs Control) and then compare these DE results among different conditions? Or should I let FeatureCounts summarize reads from all two batches into one CountData to feed into DEseq2 for DE analysis?

PS. all conditions share the same controls which are A.1; A.2; A.3;

 

 

deseq2 featurecounts rnaseq • 1.2k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Unfortunately, by running the samples that way you have completely aliased batch with any differences between conditions F and E and the others. So if you need to compare say ConditionF and ConditionD, there is no way to say if any apparent differences are really biological or if they are simply due to technical differences between runs.

Hypothetically these technical differences will be much smaller than the biological differences, but that is dependent on the underlying biology. The only hope here is that the technical differences are not predominating (not that you can say for sure if that is the case or not), and just combine and cross your fingers.

An alternative way to run these samples would have been to barcode, then mix all of them together and run on as many lanes as required to get the targeted depth. If you don't get the depth, you can simply re-run on as many lanes as needed to 'bump up' to the depth you want. That way you have randomized all samples into each technical replicate, and the technical differences between lanes or runs will no longer matter.

ADD COMMENT
0
Entering edit mode

Hi, James,

Thanks for your advice. Yes, we realized that batching effect can introduce bias into our samples after we completed all the sequencing. We are doing as exactly what you suggested now for other experiments. 

However, with these old data, we want to take a look at any meaningful indications. We will follow up to validate via biological assays. 

ADD REPLY

Login before adding your answer.

Traffic: 958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6