We have 18 treatments, in two replicates of an RNAseq experiment, for a total of 36 samples. The first rep was generated on a HiSeq 2500 and is about 8 milllion 50 bp reads per sample. The second rep was on a NextSeq 500, and they are 75 bp reads at ~32 million per sample. We are following closely the EdgeR approach to finding batch effects given in the June 2016 version of the manual, including the TMM normalization. The MDS plot shows a batch effect pretty well. We can get a generous number of DE genes at the end. I am concerned that having only two reps makes a test for, and removal of, the batch effect inconclusive, as it would seem to leave too few degrees of freedom for the model, model.matrix(~Time+Time:Treat), where Time would be the reps. Is this so? And if so what can be done, if anything? I have looked over the previous emails but all the others I find are somewhat more complicated experiments that might require more dof. Or is it best to just trim the new reads to 50 bp, and use a subset of each sample, comparable in size to the first sample?
Also, is TMM normalization expected to be robust to such differences in sample size, or is that a likely source of the batch effect?
Thanks,
Bill