Hello everyone,
I was wondering if anyone could offer some clarity on the appropriate GSEA settings to use with RNA-seq data?
In brief, I have two groups (consisting of n= 17 in group 1 and n= 13 in group 2) I am interested in testing for the enrichment of a signature.
My data has been filtered on a mean absolute deviance cutoff to exclude genes with low variance, and I've used limma (and specifically voomWithQualityWeights) to fit a linear model to my data and generate differentially expressed gene lists.
Additionally, I'd exported the entire dataset to input as a .gct file into GSEA with a .cls phenotype file and analysed with the Signal2Noise ranking metric, but I was reading that using the GSEApreranked might be better? Is this a more valid approach? As I've read in a few places that this might inflate my p values and should only be used under certain circumstances (e.g. low numbers of replicates, https://stat.ethz.ch/pipermail/bioconductor/2014-January/057214.html).
In which case, there appears to be little consensus on the best way to rank my genesets (by p value or by FC?) and I'd very much appreciate some guidance as well...
Kind regards and many thanks, in advance, for your help!
I recommend either ROAST (from the limma package) or QuSAGE for gene set analysis.