Dear list users,
I am trying to test if a given, predefined set of genes is differentially expressed between two groups in an RNA-seq experiment (count data, using voom) . I am therefore interested in only evaluating the hypothesis of differential expression within my set, so as to limit the burden of multiple-testing, for instance.
In light of this, which of the following is likely to be the most appropriate approach.
1) reduce the counts matrix to my gene set first, estimate mean-variance weights using voom, go on to use lmFit, eBayes and topTable to identify DEGs
or
2) model the mean-variance relationship across the whole matrix of counts with voom, then subset the voom object to only include my genes of interest, and then use lmFit, eBayes and topTable.
or
3) Carry out mean-variance modelling and shrinkage on the whole dataset, then subset the topTable to my gene set and re-estimate false discovery rates from the raw P values, since the only hypotheses I intended to test were those related to my subset of genes.
In terms of numbers, I am looking at a subset of 153 genes out of a total of 20,000.
Best wishes,
Ankur Chakravarthy