Dear DESeq2 team,
I am currently using DESeq2 (v1.12.0 under R 3.3.0) to analyse some processed single-cell RNA-seq data, and the data's inherent noisiness is leading to many genes having many values detected as outliers (eg >3k genes out of 10k analysed). Given the number of samples (cells, ~250 of them), DESeq2 goes on to replace outlier counts & refits the model (default minReplicatesForReplace=7). I am passing parallel=TRUE & a BPPARAM argument to the DESeq() call and the initial fitting is indeed parallelised, however the refitting done within function refitWithoutOutliers() is not, and due to the high number of outliers, this is taking up most of DESeq()'s runtime (at least 2/3 of the runtime). Would it be possible to parallelise this function?
Alternatively, should I really be treating outliers differently? I followed the recommendations in the DESeq2 vignette but found no "bad" samples that could be held responsible for the numerous outlier counts, and my impression was that sticking with the timmed mean replacement scheme was sufficiently conservative IRT downstream DEG calling.
Either way, if refitWithoutOutliers() was parallelised it would make investigating these issues quicker.
Please let me know what you think.
Thank you in advance for your time & best regards,
-- Alex