Question

Incongruencies between DEXSeq and BitSeq/Cuffdiff

0

Entering edit mode

elle ▴ 10

@elle-7644

Last seen 9.8 years ago

Germany

Hi!

I am studying alternative splicing in some genes. The original data were analyzed a couple of years ago with DEXSeq (I ignore which version was used as the analysis was done by a previous colleague). To integrate the information about exon usage with infomration about differential isoform expression I run BitSeq and Cuffdidd as comparison. While the number and the pattern of differentially expressed isoforms are quite consistent between BitSeq and Cuffdiff the comparison with DEXSeq is quite puzzling. Only half of the isoforms detected by DEXSeq are significant also in BitSeq/Cuffdiff output and even more confusing some of them show an opposite pattern (for a specific exon which show a negative log2fold change between 2 conditions in DEXSeq the corresponding isoforms have a positive log2fold change between the same 2 conditions in BitSeq/Cuffdiff).

I can't think of a possible biological explanation for that as I would expect that at least one of the corresponding isoforms of a more expressed exons would also be more expressed. I was planning to repeat the DEXSeq analysis with the current version (1.14), can I expect results more compatible with the BitSeq/Cuffdiff output? Are you aware of similar inconguencies with the previous version of DEXSeq?

Also, to make the comparison with BitSeq (transcriptome-based) more reliable we are planning to align the reads to both genome and transcriptome for DEXSeq 1.14, is this a good strategy?

Thanks!

dexseq bitseq cuffdiff Differential exon usage isoforms • 4.0k views

ADD COMMENT • link 9.9 years ago • updated 9.8 years ago elle ▴ 10

1

Entering edit mode

Hi,

I'm sure that all previous replies regarding DEXseq are really helpful for your question, but I would also recommend to make sure that all methods are working on the same reference annotation. For example, at UCSC tables there is a variety of different transcriptome annotations (e.g in human: ucsc, ensembl, refseq, etc...), each one consisting of a significantly different total number of transcripts. After that, you may also consider repeating the analysis using the most recent version of each package.

ADD REPLY • link 9.9 years ago panagiotis.papastamoulis ▴ 30

score 1 · Answer 1 · 2015-06-03

1

Entering edit mode

elle ▴ 10

@elle-7644

Last seen 9.8 years ago

Germany

Hi!

Sorry for the late reply. I'll post here another example with the analysis made with DEXSeq 1.14. For this gene DEXSeq report a differential usage of exon/bin E005 with a log2FC of -0.52. The Transcript ID related to this exon/bin in DEXSeq results to be significantly differentially expressed in BitSeq/Cuffdiff with a positive lof2FC of 1. How do you suggest I should interpret this case? Looking at the plots it seems that the differential expression is called when the proportion between E005 and the other exons/bins in the sample "red" is lower than in the sample "blue" (bigger "drop" in red compared to blue), so the negative log2FC indicates that the proportion of that exon/bin in relation to the other exons/bins is lower in the comparison "red" vs "blue" (and not that the absolute abundance of that exon decrease from "blue" to "red"), is this interpretation correct? The data from Cuffdiff actually report that the respectve isoform is doubled in "red" compared to "blue", but its proportion decreases from 26% to 13% (where 100% is the sum of the expressions of all the isoforms of that gene).Please let me know if this interpretation is correct or if I'm still misunderstanding the results (I'm really a newbie in bioinformatics).

Thanks again!

ADD COMMENT • link 9.9 years ago elle ▴ 10

1

Entering edit mode

Hi @elle,

I can give you the DEXSeq interpretation of your data for this gene. The gene is differentially expressed between the blue and the red conditions (upper panel). When removing for differences in expression strength, it can be seen that the exon bin E005 is used more frequently in the blue condition as compared to the red condition (lower panel) and is thus called as differentially used. To me it seems like DEXSeq is doing correctly its job. To compare this results with cuffdiff or bitseq, additional information is needed:

a) What is the transcript isoform identifier that cuffdiff/bitseq are reporting to be twice in the red condition compared to the blue condition?

b) Can you add the transcript isoforms to the plot (displayTranscripts=TRUE), with the identifiers of the transcripts isoforms (names=FALSE) to the DEXSeq plot?

ADD REPLY • link 9.9 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

And sorry if I didn't add other information about the previous example but I thought it would be more meaningfull to discuss the updated results.

ADD REPLY • link 9.9 years ago elle ▴ 10

0

Entering edit mode

Glad you came back, as I was curious to see if we could get to the bottom of this!

For what it's worth, it looks to me like DEXSeq is doing the right thing -- Alejandro does a good job (obviously, it's his package ;-) of explaining why below.

ADD REPLY • link 9.9 years ago Steve Lianoglou ★ 13k

score 0 · Answer 2 · 2015-05-25

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 13 days ago

United States

You mention that you have identified specific instances/exons where the two methods differ. Have you looked at the data over these regions to see what makes sense from the data?

ADD COMMENT • link 9.9 years ago Steve Lianoglou ★ 13k

score 0 · Answer 3 · 2015-05-26

I agree with Steve. For this cases, it is useful to visualize the data. DEXSeq provides the function plotDEXSeq to do this, and from the output of this function you should be able to see the differences that DEXSeq is detecting. You could also use this function to visualize the transcripts detected to differ according to bitseq/cuffdiff and you should be able to evaluate why it is not being called by DEXSeq.

Out of curiosity, how big is the overlap between bitseq and cuffdiff?

score 0 · Answer 4 · 2015-05-26

0

Entering edit mode

elle ▴ 10

@elle-7644

Last seen 9.8 years ago

Germany

Thanks for your replies Steve and Alejandro. To be honest is not totally clear to me how the plotDEXSeq function would help me to understand why an isoform has not been called differentially expressed. On the contrary, even without comparing to the BitSeq/Cuffdiff output I found these plot a bit confusing sometimes , for example in a plot with fitted splicing , some exons have counts which appear to be more than double in condition 2 in respect to condition1, but the calculated log2fc is only 0.2. So, it would e better for me to have some "numerical" and not "visual" way to discriminate.

For your last question Alejandro, these are mu numbers:

BitSeq= 5443 isoforms (with cutoffs pplr<0.05 and pplr>0.95)

Ciffdiff= 4588 (FDR=0.01)

overlap BitSeq/Cuffdiff= 3258 (all of them have the same pattern of log2fc in the 2 methods)

The possible isofomrs detected by DEXSeq (for 524 exons differentially expressed) are 1157, only 217 of which are also predicted by bothe BitSeq/Cuffdiff

ADD COMMENT • link 9.9 years ago elle ▴ 10

1

Entering edit mode

Hi @elle.

Of course plotDEXSeq is not mandatory, its just a suggestion, you could also use your favorite genome browser. The important point is that visualizing the data can give you insights of why a method is calling (or not) for differential expression of transcript isoforms. I would not know a "numerical" method to compare how two methods perform in a given dataset unless you have a gold standard set of true positives genes and true negatives genes.

I think that the way that DEXSeq can help you to find why DEXSeq is calling or not calling something is evident if you, for example, run the code below:

data("pasillaDEXSeqDataSet", package="pasilla")
dxd <- estimateSizeFactors( dxd )
dxd <- estimateDispersions( dxd )
dxd <- testForDEU( dxd )
dxr <- DEXSeqResults(dxd)
significantGeneIDs <- unique( dxr[which( dxr$padj < 0.1 ),"groupID"])
plotDEXSeq(dxr, geneID[9], norCounts=TRUE, expression=FALSE, splicing=TRUE, legend=TRUE)

From the output of this plot is evident that some exons have an increased number of reads in one condition compared to the other condition.

notsignificantGeneIDs <- unique(geneIDs(dxd)[!geneIDs(dxd) %in% geneID])
plotDEXSeq(dxr, notsignificantGeneIDs[9], norCounts=TRUE, expression=FALSE, splicing=TRUE, legend=TRUE)

From this second plot, such change is not evident, since the variability of the exonic counts does not seem to be associated to the condition of the samples. I was thinking of this kind of visual inspections.

If you provide examples of confusing outputs of plotDEXSeq I would be happy to discuss them :)

Alejandro

ps. DEXSeq won't work if you align your read fragments to the transcriptome, since most of the reads would not be able to be uniquely assigned to single transcript isoforms.

ADD REPLY • link 9.9 years ago Alejandro Reyes ★ 1.9k

score 0 · Answer 5 · 2015-05-26

0

Entering edit mode

elle ▴ 10

@elle-7644

Last seen 9.8 years ago

Germany

Thanks!

I'll try with the example you suggested.

Here there is one exampe of plot that I found confusing, as the counts for the exon 002 seem to be quite different between the 2 conditions, while the log2fc for the same exon was only 0.22

ADD COMMENT • link 9.9 years ago elle ▴ 10

0

Entering edit mode

I was also discussing with others about the possible biological explanation for the observed discepancies. Would it be possible that even if some of the isoforms containing one exon are more expressed or constant between 2 conditions the specific exon results less expressed, maybe because taking into account all the isoforms containing that exon its counts is less in condition 2 than in condition 1? I found hard to imagine such a scenario because in that case I would expect a clear decrease of at least one of the isoforms corresponding to the exon, but I'm open to any hypothesis. Tomorrow I'll have also the results of the new version of DEXSeq and I'll see if it's consistent with the previous one.

Thanks for your help!

ADD REPLY • link 9.9 years ago elle ▴ 10

0

Entering edit mode

Which pattern observed here is the one that is confusing?

The differences in 5' expression between the two conditions could "easily" be explained by two isoforms that only differ by their transcription start site ... I wonder if the transcript annotation supports that?

It might be somehow mysterious that E002 is called significant but E001, as it seems that their differences in expression are rather constant (although the ratio isn't) ... either way, perhaps E001 has lower expression, and therefore higher overdispersion, which gives the non-significant call.

Or were you curious why exon E003 expression is so much higher than the rest of the exons/bins?

ADD REPLY • link 9.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

This plot refers to the exons expression. I'd say that exons 001 and 003 are not significantly differentially expressed because other isoforms which contain them are expressed in such a way that the overall count for these exons is the same in the 2 conditions. My concern in the plot is about exon 2 which referring to the image seems to be almost doubled in condition 2 while my log2fold change reports 0.2 which is an increase of less than half.

ADD REPLY • link 9.9 years ago elle ▴ 10

0

Entering edit mode

Well, the logFC provided is after you control for changes in gene expression -- which, admittedly looks approximately the same between the two conditions over exons/bins 3:5 -- is this only a 5 exon gene?

ADD REPLY • link 9.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Could you also include the plot with the normalized counts for the same gene? Another option could be that the counts for the first exon have more variability and thus are not detected to be significant.

ADD REPLY • link 9.9 years ago Alejandro Reyes ★ 1.9k

score 0 · Answer 6 · 2015-06-27

0

Entering edit mode

elle ▴ 10

@elle-7644

Last seen 9.8 years ago

Germany

I'm sorry, I can't really provide the transcript and gene IDs as it's from an ongoing study and I'm not allowed to public these information.

Really thank you for the explanations, it was of great help :)

ADD COMMENT • link 9.8 years ago elle ▴ 10