Question

Different number of exons in reference genome affects DEXSeq analysis

0

Entering edit mode

Alessia • 0

@1b283aa4

Last seen 4 months ago

Spain

I am reproducing a differential exon usage analysis using DEXSeq that was done on a previous version of the human reference genome (not sure which) with respect to the one I am using now (ensembl release 111).

The previous version reports 36 exons for my gene of interest (myo6), while the one I am using reports 73 exons. The exons I am looking for in silico were previously validated in the lab.

I am performing a DEXSeq standard analysis and I have 3 conditions, where 1 is the control and the other 2 are treatments. I performed the dexseq complete pipeline separating the two cases:

treatment 1 vs control
treatment 2 vs control And the output is then composed by two tables. However, the results are no longer significant for the exons of interest, while they were in the previous analysis (with the reference genome containing less exons).

Then I tried performing the dexseq analysis by creating a unique model using all 3 conditions and specifying in the fold change computation denominator = control. In this way the exons of interest are significant again. However, this method outputs 30k significant events versus the 500 outputted by the previous method.

I am wondering whether the different number of exons in the reference affects the DEU analysis of dexseq and if there is any possible solutions.

DEXSeq • 467 views

ADD COMMENT • link updated 7 months ago by Alejandro Reyes ★ 1.9k • written 7 months ago by Alessia • 0

score 0 · Answer 1 · 2024-04-18

0

Entering edit mode

Alejandro Reyes ★ 1.9k

@alejandro-reyes-5124

Last seen 4 months ago

Novartis Institutes for BioMedical Rese…

Likely the new annotation that you are using has more annotated isoforms, and thus the exons are splitted into more disjoint exonic bins. I'd expect some differences in the output, but they should not be completely discordant. Something I'd check is whether you are using the same full and reduced models that were used in the old analysis.

ADD COMMENT • link 7 months ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Hi Alejandro, thank you for you reply. I have been using the same full and reduced models. After some research I discovered that using Ensembl referrence genome is not the most suitable choice, it is best to use RefSeq sequence. In a practical sense, I think the biggest difference between RefSeq and Ensembl/GENCODE is in the sensitivity/specificity trade off. Ensembl aims more towards the inclusive end, including a far larger number of transcript variants, many of which are only weakly supported.

ADD REPLY • link 7 months ago Alessia • 0

0

Entering edit mode

Hi Alessia. Makes sense. Something I use often are the support levels from ENSEMBL and the annotation of principal isoforms. These substantially reduce the number of low-confidence transcript isoforms.

ADD REPLY • link 7 months ago Alejandro Reyes ★ 1.9k