Hi,
I'm quite sure I have a problem with the DESeq2 independent filtering. DESeq2 doesn't flag any outliers when performing the DE analysis. We have little concern about blood being in the samples as the raw counts vary from hundreds to tens of thousands in some HB related genes. For some reason DESeq2 doesn't get rid of these high count outlier genes which I think should appropriate. Or is possible for some genes to have such big variation in gene expression?
We have filtered out lowly expressed genes before the analysis but I think more filtering should be maybe manually done to get these high count outlier genes removed. When plotting the Q-Q-plot of p-values the trend a bit inflated which is a a slight concern also.Â
Help would be really appreciated because not quite sure how to approach this problem.
Btw, there is now an (even better) alternative to independent filtering: independent hypothesis weighting, IHW.
Yeah the former is my focus here.
1) I think one of the problems is that I used LRT-test. Do understand correctly that with LRT the "cooks distance" is not applied in the analysis? I have studied the "Cooks distance"-values but I'm not quite sure what threshold to use with manual filtering - they range from 0-2. Is there any "thumb rules" as to what is regarded as outliers based on Cooks distance?
2) Also studied the basemean as possible parameter to set a threshold as we have some genes that have expression of millions of reads but from plotting it's hard tell weather they are outliers - especially from normalised counts. If there are outliers with abnormally high variance in reads between libraries what would be the best approach to detect these and filter out of the analysis?
With best regards,
Heikki