Question

Surprising behavior for DEXseq dispersion estimate plot?

0

Entering edit mode

Elijah • 0

@937d4250

Last seen 2 hours ago

United States

Hello,

I am running DEXseq, where I have adapted DEXseq to look at relative polyA site usage rather than relative exon usage. Upstream of running DEXseq, I called polyA sites in my dataset and got counts for the sites.

My concern is that when I look at my dispersion estimate plot: enter image description here

There is a secondary "cloud" of points around 1e-04.

Generally, what I'm wondering is: is this type of shape expected for the DEXseq dispersion estimate plots, or is this indicative of some type of artifact in my upstream data handling?

For a bit more context, I've tried different thresholding of the minimum # of counts for each feature in my dataset, and that doesn't seem to affect the shape of the dispersion estimate plot. There doesn't appear to be any correlations with the lengths of the features (some pA sites nearby each other are clustered together) or with adjusted p-values or with log fold change values.

Some specific sites also appear (to my eyes) to have very similar counts, but different dispersion estimates, such as:

A

enter image description here

where the dispGeneEst is 2.330017e-05

and

B

enter image description here

where the dispGeneEst is 5.466213e-02.

In fact, the only discernible difference between the two is that the dispGeneIter is 71 for A and 2 for B, and this trend of dispGeneIter being much higher for the sites with dispGeneEsts ~1e-04 seems true for many cases.

EDIT: Here is the dispersion estimate plot colored by the # of iterations to run, with red = higher # of iterations and blue = lower # of iterations:

enter image description here

Is this behavior expected in DEXseq?

However, changing parameters that I thought might affect the MLE, such as niter to 20 or changing maxit to 500, does not have any significant affect on the plot either.

Thank you for any advice!

Code run pasted below:


print('set up for DEXseq analysis')

  #Set up for DEXSeq analysis
  dxd <- DEXSeqDataSet(countData, sampleInfo,
                 design= ~ sample + exon + type:exon + condition:exon,
                 featureID, GeneID)

  print('Carry out DEXSeq analysis main steps')

  dxd = estimateSizeFactors( dxd )

  formulaFullModel    =  ~ sample + exon + type:exon + condition:exon
  formulaReducedModel =  ~ sample + exon + type:exon 
  dxd = estimateDispersions( dxd, formula = formulaFullModel)

  plotDispEsts( dxd )

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

DESeq2 DEXSeq • 95 views

ADD COMMENT • link 1 day ago • updated 5 hours ago Elijah • 0

score 3 · Accepted Answer · 2025-03-06

3

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 hours ago

United States

The red points are just slowly moving toward -Inf. They tend to have variance < mean (if it were a simple design and you could just look at the marginal variance of counts). It's not a concern that those points are there.

ADD COMMENT • link 6 hours ago Michael Love 43k

0

Entering edit mode

Thank you so much for your quick reply! I was worried if some underlying technical artifact or unusual behavior would be causing the pattern seen with the red points - I appreciate your clarification that it's not concerning.

ADD REPLY • link 5 hours ago Elijah • 0