Question

Reference paper or resource for limma::diffSplice and edgeR::diffSpliceDGE methods?

4

Entering edit mode

maltethodberg ▴ 180

@maltethodberg-9690

Last seen 5 days ago

Denmark

I have recently obtained very promising results using the diffSplice and diffsSpliceDGE from limma/edgeR, respectively. I was surprised to find that neither method has a cited reference despite being included in both the main limma paper and both edgeR and limma user guides. DEXSeq in comparison has a separate reference in addition to DESeq/DESeq2.

This meant that I had to piece together what the method actually does from the help files from diffSplice/topSplice and diffSpliceDGE/topSliceDGE.

As far as I can tell, diffSplice works directly from the model fitted in a normal limma/edgeR analysis, unlike DEXSeq which fits a separate model including the exons, although it still uses the same dispersion estimation from DESeq2.

As I understand, the F-statistics test tests whether any exon logFC is different from any other, yielding a single gene-level p-value. The exon-level test tests whether each exon has a logFC different from the average across genes. These exon-level p-values are then corrected using the Simes method, before using the lowest p-value of among exons to represent the gene.

I am unfamiliar with the Simes method for correcting p-values. Conceptually, the approach seems similar to DEXSeq's approach with perGeneQvalue, where p-values are defined first at the exon level, and then aggregated at the gene level (Asking whether at least one exon-level p-value is significant in the gene). Intuitively, how is aggregating exon-level p-values using the Simes method different from using DEXSeq perGeneQvalue? Does it possibly relate to the comment that "The exon-level tests are not recommended for formal error rate control." from the help files?

Any insight or pointers to resources are much appreciated.

edgeR limma diffSplice DEXSeq • 3.8k views

ADD COMMENT • link updated 8.6 years ago by Charity Law ▴ 90 • written 8.6 years ago by maltethodberg ▴ 180

2

Entering edit mode

Yunshun Chen ▴ 900

@yunshun-chen-5451

Last seen 3 months ago

Australia

The Simes method was introduced and described in the following paper:

R. J. Simes. An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73(3):751~754, 1986.

The Simes' method controls the family-wise error rate in the weak sense, i.e., only when all null hypotheses are true (no exons within the gene are differentially used). I'm not sure how DEXSeq perGeneQvalue works though.

ADD COMMENT • link 8.6 years ago Yunshun Chen ▴ 900

0

Entering edit mode

Interesting, so what motivated the choice of this particular statistics relative to something more common like the Benjamini-Hochberg correction? Does it have to do with the fact that p-values can be correlated, as described in the introduction of the paper?

ADD REPLY • link 8.6 years ago maltethodberg ▴ 180

1

Entering edit mode

Actually Simes method is just as well known in mathematical statistics circles as Benjamin-Hochberg. In fact, Simes and BH are essentially the same algorithm, just used for slightly different purposes.

We use Simes simply because it is the most statistically powerful adjustment method that gives the required result, which is weak FWER control within a gene. We then apply BH to the gene-level Simes-adjusted p-values.

If you want to understand this approach, you could look at this paper:

http://nar.oxfordjournals.org/content/42/11/e95

Although the setting is different, the principles are the same. This article shows that applying the BH algorithm to window-level p-values fails to give correct FDR control at the region level. We solve this problem by using Simes method to aggregate the window-level p-values for each region, then apply BH to the region-level Simes p-values. This process controls the FDR correctly at the region level, whereas other methods do not.

Mvh
Gordon

ADD REPLY • link 8.6 years ago Gordon Smyth 52k

Gordon Smyth · Accepted Answer · 2016-08-04

I'm glad to hear that you are finding promising results using diffSplice and diffSpliceDGE from limma/edgeR. It is true that neither of the methods have a cited reference as yet, but we are hoping to write something up for it in the near future.

It's not clear to me how DEXSeq's perGeneQvalue function works, so I can't comment much on the similarities between that and diffSplice's gene-level tests. Both diffSplice and diffSpliceDGE offers two gene-level tests -- one using an F-test and the other using Simes correction. In practice, the main difference between the two is that the F-test is better at picking out genes where evidence of differential splicing comes from several exons (such that there are many exons with logFCs that are different from the rest); whereas the Simes correction is better at picking out genes where there are fewer exons affected. For example, if there is a gene where the logFC in only one exon is very different from the rest, then the Simes method would pick this out better than the F-test.

"The exon-level tests are not recommended for formal error rate control" because our tests look at overall changes in exon expression patterns between groups. The expression of individual exons can be affected by the expression of multiple transcripts containing that exon for that gene. Depending on how the transcript-level expression translates into exon-level counts, looking at exon-level tests can be misleading and have inaccurate error rate control. This is why we don't recommend it.