Question

DEXSeq analysis with large number of samples

1

Entering edit mode

nemanja.vucic ▴ 10

@nemanjavucic-16540

Last seen 6.6 years ago

Hi,

I am wondering if it is even possible to perform differential exon usage analysis with DEXSeq having a large number of samples, in my case almost 600? Samples are unequally divided between two conditions (1:10 ratio) and DEXSeq object contains almost 500K rows (exons) and 1.2K columns (samples x2). After performing the normalization, dispersion estimation step is running on 16CPUs for several days. I tried subsampling DEXSeq object and for 17 features (exons) on 16CPUs analysis lasted 3 minutes meaning that it takes approximately 3min CPU time per exon. Given that, analysis of the full dataset would never finish.

dexseq estimatedispersions • 1.4k views

ADD COMMENT • link updated 6.6 years ago by Michael Love 43k • written 6.6 years ago by nemanja.vucic ▴ 10

2

Entering edit mode

Yeah, the GLMs can take very long to fit when the models are large. One option is just to configure a BPPARAM with a cluster configuration and distribute it across many jobs. With such number of samples some of the steps from DEXSeq might not be needed. I would have a look at the diffSplice function from limma, it is designed to address the same question as DEXSeq and does not have a problem dealing with large datasets.

ADD REPLY • link 6.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

I already tried configuring BPPARAM and splitting analysis across many AWS instances but without much success, so I'll try diffSplice from limma. Thanks for a quick response.

ADD REPLY • link 6.6 years ago nemanja.vucic ▴ 10