Question

Very high threshold for independent filtering

0

Entering edit mode

thomas.deimel • 0

@thomasdeimel-20051

Last seen 6.2 years ago

Experimental Set-up: I am analysing an observational data set (i.e. no randomisation to condition groups) consisting of a couple of patient variables (lab values, etc.) and RNA-Seq data for miRNAs. I am trying to identify differentially expressed miRNAs for certain (dichotomised) variables while controlling for others, e.g. formula: ~ cov1 + cov2 + variableofinterest.

Strange Observation: For some of my variables of interest, a surprisingly large number of genes are filtered out in independent filtering. I have checked that the NA p-values are not due to all-zero counts or outlier exclusion. As can be seen from the example plot below, the threshold for filtering out genes is set quite high (>75 %-quantile of mean of normalised counts) and there is a pretty sharp rise in number of H0 rejections at that point. From the histogram of p-values it seems that most of the non-signif genes are filtered out - but the general pattern (though very high in terms of number of filtered genes) seemed ok to me.

My questions are:

1) Is there a point at which filtering out too many genes could lead to a non-acceptable increase in type-I error rate? I.e., is there a limit to how far one can go with independent filtering before the paradigm of increasing sensitivity without getting too many false-positives breaks down?

2) In the "filtering threshold-selection plot", there are some local minima/maxima and the fit deviates quite a bit from the "oscillating" observed data points. Is any of this concerning (other than affecting the setting of the threshold by increasing the residual standard deviation that is subtracted from the fit's peak when setting the cut-off -- if I have understood that part correctly)? Any ideas why the plot might look like this at all?

Code used to create the plots is essentially just copied from the DESeq2 vignette. Please let me know if there is any other information you would like me to provide

Plots:

https://www.dropbox.com/s/5cfz8g9bppzaql5/indepfilteringex.pdf?dl=0

https://www.dropbox.com/s/21y1uv8h3merr8l/Bildschirmfoto%202019-03-01%20um%2014.38.29.png?dl=0

deseq2 mirna independent filtering • 1.7k views

ADD COMMENT • link updated 6.2 years ago by Michael Love 43k • written 6.2 years ago by thomas.deimel • 0

score 0 · Answer 1 · 2019-03-01

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

So, the way I changed the IF routine in DESeq2 was to smooth the curve and take the filter threshold that gets within "noise" range of the maximum. This helps to mitigate some of the stochasticity problems of the greedy procedure. But meanwhile, if you want a more principled approach, why not use IHW which was designed to address the greedy IF procedure:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#independent-hypothesis-weighting

ADD COMMENT • link 6.2 years ago Michael Love 43k

0

Entering edit mode

Thanks for the swift response - I will have a look into IHW (and might come back with follow-up questions once I have a better understanding)!

ADD REPLY • link 6.2 years ago thomas.deimel • 0