is it necessary to check batch effect in this case? and how to?
2
0
Entering edit mode
amoltej ▴ 10
@amoltej-7192
Last seen 7.2 years ago
Australia

Hello everyone,

I have a data set generated from 88 samples in a single run using all 8 flow cells. All libraries were mixed together and sequenced on all the flow cells at a time.

So Basically I received 8 fastq files corresponding to each sample. I merged all 8 files for each sample and created single fastq file per sample. 

My question is, is it necessary to look at the batch effect in this situation? if yes how shall I specify batch when I am using SVA package? If you have any other suggestion, that's also welcome.

Thank you

Amol

svaseq combat sva RANseq batcheffect removebatcheffect • 2.7k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 18 minutes ago
WEHI, Melbourne, Australia

The flow cell does not constitute a batch effect. There is no effect to remove, and no way to remove it if there was one.

ADD COMMENT
0
Entering edit mode

Thanks for the reply Gordon Smyth.

Amol

ADD REPLY
1
Entering edit mode
Jakub ▴ 50
@jakub-9073
Last seen 9 months ago
United Kingdom

If I understand correctly, you have run the same 88 samples in each of the flow cell lanes? In that case that's an optimal experimental design and goes a long way in preventing the sort of batch effects you might be worried about. I would argue that an optimal block design is the main answer to your question (http://www.genetics.org/content/185/2/405.full).

I personally do not merge at the fastq stage but do all my quality control of fastq files independently. That way you can still obtain data on the individual lanes and you could theoretically discard lanes that failed in some way. I would also do the mapping (alignment) independently (if you still do that), as you then can get your independent mapping statistics for each lane, which can again be helpful for QC. I then tend to merge my .bam files.

I agree with Gordon though, and I have never used lanes at later stages of the analysis, and I anticipate that would be futile and incorrect.

ADD COMMENT
0
Entering edit mode

Thanks for the reply Jakub.

after merging respective fastq files, and quality control using FastQC and trimmomatic, I am getting only 1-2% reads discarded from each file. can I take these readings as an indication that all the lanes produced good quality reads?

Amol 

ADD REPLY
1
Entering edit mode

Personally I think, unless something is seriously wrong with your experiment, that trimming the reads is both unnecessary and harmful. It is better to allow a good quality aligner to make decisions about this. See my comments to one of the referees in this workflow:

https://f1000research.com/articles/5-1438/

The workflow also give some brief guidelines regarding quality checking. The real proof is aligning the reads successfully to a good quality reference genome.

If you want to discuss QC further, it would be advisable to post a new question rather than to continue this thread (which is about batch correction).

ADD REPLY
0
Entering edit mode

Glad to help. I assume you used default trimmomatic settings, which are reasonable. The result looks fine.

You will have to decide how much QC you do on 'lanes', and whether you want to know per lane GC content etc... Trimming is only one part of QC. There are many ways to explore biases from sequencing (they do exist), but it all depends if this is relevant to your question and block design.

ADD REPLY

Login before adding your answer.

Traffic: 709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6