Question

can a batch effect be determined in edgeR with only 2 reps?

0

Entering edit mode

Spollen, William G. ▴ 30

@spollen-william-g-5281

Last seen 8.4 years ago

United States

We have 18 treatments, in two replicates of an RNAseq experiment, for a total of 36 samples. The first rep was generated on a HiSeq 2500 and is about 8 milllion 50 bp reads per sample. The second rep was on a NextSeq 500, and they are 75 bp reads at ~32 million per sample. We are following closely the EdgeR approach to finding batch effects given in the June 2016 version of the manual, including the TMM normalization. The MDS plot shows a batch effect pretty well. We can get a generous number of DE genes at the end. I am concerned that having only two reps makes a test for, and removal of, the batch effect inconclusive, as it would seem to leave too few degrees of freedom for the model, model.matrix(~Time+Time:Treat), where Time would be the reps. Is this so? And if so what can be done, if anything? I have looked over the previous emails but all the others I find are somewhat more complicated experiments that might require more dof. Or is it best to just trim the new reads to 50 bp, and use a subset of each sample, comparable in size to the first sample?

Also, is TMM normalization expected to be robust to such differences in sample size, or is that a likely source of the batch effect?

Thanks,

Bill

edger rnaseq batch effect read length • 1.2k views

ADD COMMENT • link 8.4 years ago Spollen, William G. ▴ 30

score 0 · Answer 1 · 2016-11-11

You will certainly have a batch effect, but you have slightly misinterpreted the approach to batch correction. Following the standard edgeR approach, the design would be:

design <- model.matrix(~Treat+Time)

You can't use the design matrix you state because it would estimate a separate treatment effect for each batch, and that's the last thing you want.

Repeating the whole experiment gave you 18 residual degrees of freedom, and the batch adjustment uses only 1 of these, so there is no problem with your design. Given that you have repeated all 18 treatments at both times, the batch effect can be reliably estimated.

score 0 · Answer 2 · 2016-11-14

0