DESeq2 feature request: parallelised refitWithoutOutliers()
1
0
Entering edit mode
aatsmith • 0
@aatsmith-10597
Last seen 7.1 years ago

Dear DESeq2 team,

 

I am currently using DESeq2 (v1.12.0 under R 3.3.0) to analyse some processed single-cell RNA-seq data, and the data's inherent noisiness is leading to many genes having many values detected as outliers (eg >3k genes out of 10k analysed). Given the number of samples (cells, ~250 of them), DESeq2 goes on to replace outlier counts & refits the model (default minReplicatesForReplace=7). I am passing parallel=TRUE & a BPPARAM argument to the DESeq() call and the initial fitting is indeed parallelised, however the refitting done within function refitWithoutOutliers() is not, and due to the high number of outliers, this is taking up most of DESeq()'s runtime (at least 2/3 of the runtime). Would it be possible to parallelise this function?

Alternatively, should I really be treating outliers differently? I followed the recommendations in the DESeq2 vignette but found no "bad" samples that could be held responsible for the numerous outlier counts, and my impression was that sticking with the timmed mean replacement scheme was sufficiently conservative IRT downstream DEG calling.

Either way, if refitWithoutOutliers() was parallelised it would make investigating these issues quicker.

 

Please let me know what you think.

Thank you in advance for your time & best regards,

 

-- Alex

DESeq2 • 880 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

hi Alex,

The DESeq2 model is not designed with single cell in mind and I'm certain it's not the best one out there for single cell. Why don't you try using some of the software explicitly designed for single cell data? A recent review:

http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0927-y

That said, this feature request is on my long list of todos, but that doesn't mean it will be implemented soon or at all, because other more important things are above it.

For you or other users who are finding the outlier replacement for datasets with 100s of samples taking up too much time, I would even recommend minReplicatesForReplace=Inf, and then use other heuristic strategies to identify genes with extreme outliers, just because it takes a long time.

ADD COMMENT

Login before adding your answer.

Traffic: 1053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6