We have a biological system where we are interested in finding
differential expressed exons (on a genome basis) between conditions,
and are wondering whether it is more appropriate to use DESeq or
DEXSeq for the analysis. Have RNASeq data on three conditions.
>From my understanding of the two packages, DESeq (and alternatively
edgeR) allow testing for diff. expression of any object one can define
counts for, whereas DEXSeq looks for genes (however defined) where
there are only one or a few exons that show differential expression.
My initial belief was that DEXSeq was the best choice, however we are
working with data from Rat, which has rather poorly annotated exons,
especially in non-coding regions (i.e. UTRs). Therefore, I am thinking
of defining exons based on a combination of the current annotation,
known UTRs, and exons assembled by CuffLinks. I am not sure how this
set of exons would fit into DEXSeq, and it seems to me that DESeq
would be more appropriate, with determination after DE analysis to
determine exon location (CDS, UTR, etc).
I would appreciate insights or experiences others have had.
Regards,
-Robert
Robert M. Flight, Ph.D.
University of Louisville Bioinformatics Laboratory
University of Louisville
Louisville, KY
PH 502-852-1809 (HSC)
PH 502-852-0467 (Belknap)
EM robert.flight at louisville.edu
EM rflight79 at gmail.com
robertmflight.blogspot.com
bioinformatics.louisville.edu/lab
The most exciting phrase to hear in science, the one that heralds new
discoveries, is not "Eureka!" (I found it!) but "That's funny ..." -
Isaac Asimov
Dear Robert
On 03/26/2012 04:26 PM, Robert M. Flight wrote:
>> From my understanding of the two packages, DESeq (and alternatively
> edgeR) allow testing for diff. expression of any object one can
define
> counts for, whereas DEXSeq looks for genes (however defined) where
> there are only one or a few exons that show differential expression.
The crucial difference between DESeq and DEXSeq is that the latter
aims
to tease apart changes to the overall expression strength of a gene
and
changes to only some of its exons. Conceptionally, we consider the for
each sample the fraction "number of reads overlapping with the exon
(or:
counting bin) under consideration" over "number of reads mapping to
any
exon of the gene". If the gene's overall expression changes but the
relative abundances of the different transcripts stay the same, these
fractions do not change, and DEXSeq will not call this counting bin
significant even if its absolute count does change significantly.
(Note that this is a simplified explanation of what DEXSeq does
conceptually. To see what it actually does, please see our preprint on
Nature Precedings.)
> My initial belief was that DEXSeq was the best choice, however we
are
> working with data from Rat, which has rather poorly annotated exons,
> especially in non-coding regions (i.e. UTRs). Therefore, I am
thinking
> of defining exons based on a combination of the current annotation,
> known UTRs, and exons assembled by CuffLinks. I am not sure how this
> set of exons would fit into DEXSeq, and it seems to me that DESeq
> would be more appropriate, with determination after DE analysis to
> determine exon location (CDS, UTR, etc).
Once you have defined exons on a combination of information you trust,
you can use DEXSeq. All you need is a table of counts, one column for
each sample and one row for each exon -- or for whatever counting bins
you want to define: It may be useful, for example, to keep the UTR and
the coding part of outer exons separate. Then, define a factor to
indicate which rows belong to the same gene and use this to call
'createExonCountSet'.
Simon
Thanks Simon, that was a really useful explanation of how we might
want to go about it.
-Robert
Robert M. Flight, Ph.D.
University of Louisville Bioinformatics Laboratory
University of Louisville
Louisville, KY
PH 502-852-1809 (HSC)
PH 502-852-0467 (Belknap)
EM robert.flight at louisville.edu
EM rflight79 at gmail.com
robertmflight.blogspot.com
bioinformatics.louisville.edu/lab
The most exciting phrase to hear in science, the one that heralds new
discoveries, is not "Eureka!" (I found it!) but "That's funny ..." -
Isaac Asimov
On Mon, Mar 26, 2012 at 10:52, Simon Anders <anders at="" embl.de=""> wrote:
> Dear Robert
>
>
> On 03/26/2012 04:26 PM, Robert M. Flight wrote:
>>>
>>> From my understanding of the two packages, DESeq (and
alternatively
>>
>> edgeR) allow testing for diff. expression of any object one can
define
>> counts for, whereas DEXSeq looks for genes (however defined) where
>> there are only one or a few exons that show differential
expression.
>
>
> The crucial difference between DESeq and DEXSeq is that the latter
aims to
> tease apart changes to the overall expression strength of a gene and
changes
> to only some of its exons. Conceptionally, we consider the for each
sample
> the fraction "number of reads overlapping with the exon (or:
counting bin)
> under consideration" over "number of reads mapping to any exon of
the gene".
> If the gene's overall expression changes but the relative abundances
of the
> different transcripts stay the same, these fractions do not change,
and
> DEXSeq will not call this counting bin significant even if its
absolute
> count does change significantly.
>
> (Note that this is a simplified explanation of what DEXSeq does
> conceptually. To see what it actually does, please see our preprint
on
> Nature Precedings.)
>
>
>> My initial belief was that DEXSeq was the best choice, however we
are
>> working with data from Rat, which has rather poorly annotated
exons,
>> especially in non-coding regions (i.e. UTRs). Therefore, I am
thinking
>> of defining exons based on a combination of the current annotation,
>> known UTRs, and exons assembled by CuffLinks. I am not sure how
this
>> set of exons would fit into DEXSeq, and it seems to me that DESeq
>> would be more appropriate, with determination after DE analysis to
>> determine exon location (CDS, UTR, etc).
>
>
> Once you have defined exons on a combination of information you
trust, you
> can use DEXSeq. All you need is a table of counts, one column for
each
> sample and one row for each exon -- or for whatever counting bins
you want
> to define: It may be useful, for example, to keep the UTR and the
coding
> part of outer exons separate. Then, define a factor to indicate
which rows
> belong to the same gene and use this to call 'createExonCountSet'.
>
> ?Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor