I have been preprocessing in R a big data set in order to identify possible differentiated genes for two conditions. My main problem-question is, as i searched in literature, some people using filtering based on present/absent calls to remove probes which are totally(or in a significant persent) absent in their arrays. On the other hand, an important number of methodologies after normalization and quality control, use various implementations of non-specific filtering(for instance based on variance) prior to different statistical tests. My Affymetrix platform is HG-U133 plus2.0 array.
For one first approach i want to compare the 34 samples i have into two conditions to compare from the phenoData. I have tried to perform limma based on variance, but i have read from other threads, as also from other papers that is not recommended to combine variance filtering with limma. On the other hand, if i choose to filter on absent/present calls, in Affymetrix i have roughly two main options: mas5calls & the panp package which can be used for my specific platform. I have used the commands to generate present absent calls, but my main question is if i have also to filter out marginal calls or leave them in my ExpressionSet ? Also other methods i could implement based on this specific big dataset, could be multiple test procedure with unequal variance and fdr correction or the SAM test ? i could paste here a small sample from my script to give me an opinion about filtering out based in absent/present calls.Thank you again for your consideration on this matter !!
Best regards
You seem to be asking the same questions on two different threads, see Non-specific filtering methodogies for ExpressionSet in R/Bioconductor