Hi,
I am wondering if it is even possible to perform differential exon usage analysis with DEXSeq having a large number of samples, in my case almost 600? Samples are unequally divided between two conditions (1:10 ratio) and DEXSeq object contains almost 500K rows (exons) and 1.2K columns (samples x2). After performing the normalization, dispersion estimation step is running on 16CPUs for several days. I tried subsampling DEXSeq object and for 17 features (exons) on 16CPUs analysis lasted 3 minutes meaning that it takes approximately 3min CPU time per exon. Given that, analysis of the full dataset would never finish.
Yeah, the GLMs can take very long to fit when the models are large. One option is just to configure a BPPARAM with a cluster configuration and distribute it across many jobs. With such number of samples some of the steps from DEXSeq might not be needed. I would have a look at the diffSplice function from limma, it is designed to address the same question as DEXSeq and does not have a problem dealing with large datasets.
I already tried configuring BPPARAM and splitting analysis across many AWS instances but without much success, so I'll try diffSplice from limma. Thanks for a quick response.