Hi guys
I have a dataset with 100 sample and 54000 probID,I used to think that if I filter those of probID with little variation, I can reduce the number of features and then in multiple testing with limma in would give me fewer errors and finally fewer noises, but now I found that if I use genefilter package, it changes the distribution of variance and it will interfere with limma package
let me know if I am mistaken and tell me if there are other option except genefiltering package
tnx
The requirement on the filter statistic is that the distribution of the p-values under the null is uniform not only when looking at the mixture of all hypotheses, but also within each subgroup of hypotheses grouped (or stratified) by the filter statistic.
Because of the way limma's moderated t-test shares information on within-group variance between genes, this requirement often does not hold for it when the overall empirical variance is used to stratify. I posted some example plots here: http://rpubs.com/WolfgangHuber/138901. However, it does hold for the ordinary t-test -- and thus should hold for the moderated t-test if the moderation is negligible (i.e. if the Bayesian prior is overridden by the data), as it might be for a study with 100 samples.
Wolfgang