What is the most efficient usage of voom in a scenario where more batches are arriving at some time in the future? If voom is applied to each batch individually, then, due to filtering of each count matrix on CPM, the resulting normalised value tables of each batch contain a different number of rows and the combined matrix has some NA entries which is problematic because batch correction methods need a complete matrix. For example:
Batch 1 Batch 2 Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Gene 1 8.75 9.09 8.31 7.89 7.99 7.90 Gene 2 8.55 9.01 8.77 7.99 7.98 8.00 Gene 3 NA NA NA 1.00 9.00 10.00
Alternatively, if I provide the combined matrix of counts as the input to voom, Gene 3 (filtered due to low CPM) will no longer have all NA for Batch 1, but when Batch 3 arrives in the future, the values in the entire matrix will change, which may cause unease with the biologists. It there a third approach which I haven't listed as an option which is better?