I am reproducing a differential exon usage analysis using DEXSeq that was done on a previous version of the human reference genome (not sure which) with respect to the one I am using now (ensembl release 111).
The previous version reports 36 exons for my gene of interest (myo6), while the one I am using reports 73 exons. The exons I am looking for in silico were previously validated in the lab.
I am performing a DEXSeq standard analysis and I have 3 conditions, where 1 is the control and the other 2 are treatments. I performed the dexseq complete pipeline separating the two cases:
- treatment 1 vs control
- treatment 2 vs control And the output is then composed by two tables. However, the results are no longer significant for the exons of interest, while they were in the previous analysis (with the reference genome containing less exons).
Then I tried performing the dexseq analysis by creating a unique model using all 3 conditions and specifying in the fold change computation denominator = control. In this way the exons of interest are significant again. However, this method outputs 30k significant events versus the 500 outputted by the previous method.
I am wondering whether the different number of exons in the reference affects the DEU analysis of dexseq and if there is any possible solutions.
Hi Alejandro, thank you for you reply. I have been using the same full and reduced models. After some research I discovered that using Ensembl referrence genome is not the most suitable choice, it is best to use RefSeq sequence. In a practical sense, I think the biggest difference between RefSeq and Ensembl/GENCODE is in the sensitivity/specificity trade off. Ensembl aims more towards the inclusive end, including a far larger number of transcript variants, many of which are only weakly supported.
Hi Alessia. Makes sense. Something I use often are the support levels from ENSEMBL and the annotation of principal isoforms. These substantially reduce the number of low-confidence transcript isoforms.