Question

is it necessary to check batch effect in this case? and how to?

0

Entering edit mode

amoltej ▴ 10

@amoltej-7192

Last seen 7.6 years ago

Australia

Hello everyone,

I have a data set generated from 88 samples in a single run using all 8 flow cells. All libraries were mixed together and sequenced on all the flow cells at a time.

So Basically I received 8 fastq files corresponding to each sample. I merged all 8 files for each sample and created single fastq file per sample.

My question is, is it necessary to look at the batch effect in this situation? if yes how shall I specify batch when I am using SVA package? If you have any other suggestion, that's also welcome.

Thank you

Amol

svaseq combat sva RANseq batcheffect removebatcheffect • 3.1k views

ADD COMMENT • link updated 8.5 years ago by Jakub ▴ 50 • written 8.5 years ago by amoltej ▴ 10

score 1 · Answer 1 · 2016-10-22

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 26 minutes ago

WEHI, Melbourne, Australia

The flow cell does not constitute a batch effect. There is no effect to remove, and no way to remove it if there was one.

ADD COMMENT • link 8.5 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for the reply Gordon Smyth.

Amol

ADD REPLY • link 8.5 years ago amoltej ▴ 10

score 1 · Answer 2 · 2016-10-23

1

Entering edit mode

Jakub ▴ 50

@jakub-9073

Last seen 14 months ago

United Kingdom

If I understand correctly, you have run the same 88 samples in each of the flow cell lanes? In that case that's an optimal experimental design and goes a long way in preventing the sort of batch effects you might be worried about. I would argue that an optimal block design is the main answer to your question (http://www.genetics.org/content/185/2/405.full).

I personally do not merge at the fastq stage but do all my quality control of fastq files independently. That way you can still obtain data on the individual lanes and you could theoretically discard lanes that failed in some way. I would also do the mapping (alignment) independently (if you still do that), as you then can get your independent mapping statistics for each lane, which can again be helpful for QC. I then tend to merge my .bam files.

I agree with Gordon though, and I have never used lanes at later stages of the analysis, and I anticipate that would be futile and incorrect.

ADD COMMENT • link 8.5 years ago Jakub ▴ 50

0

Entering edit mode

Thanks for the reply Jakub.

after merging respective fastq files, and quality control using FastQC and trimmomatic, I am getting only 1-2% reads discarded from each file. can I take these readings as an indication that all the lanes produced good quality reads?

Amol

ADD REPLY • link 8.5 years ago amoltej ▴ 10

1

Entering edit mode

Personally I think, unless something is seriously wrong with your experiment, that trimming the reads is both unnecessary and harmful. It is better to allow a good quality aligner to make decisions about this. See my comments to one of the referees in this workflow:

https://f1000research.com/articles/5-1438/

The workflow also give some brief guidelines regarding quality checking. The real proof is aligning the reads successfully to a good quality reference genome.

If you want to discuss QC further, it would be advisable to post a new question rather than to continue this thread (which is about batch correction).

ADD REPLY • link 8.5 years ago Gordon Smyth 52k

0

Entering edit mode

Glad to help. I assume you used default trimmomatic settings, which are reasonable. The result looks fine.

You will have to decide how much QC you do on 'lanes', and whether you want to know per lane GC content etc... Trimming is only one part of QC. There are many ways to explore biases from sequencing (they do exist), but it all depends if this is relevant to your question and block design.

ADD REPLY • link 8.5 years ago Jakub ▴ 50