Hi,
I've been using the DESeq as well as DESeq2 packages for my RNAseq
analysis. For one particular study alone, I find some discrepancies
between DESeq and DESeq2 results. I've read in forums that the gene
list is longer when we use DESeq2, which is true in my case. But what
I also notice for this study is that there is no overlap between the
gene list obtained using DESeq and DESeq2. Could you please help in
understanding this difference?
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
GenomicRanges_1.14.4
[5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
--
Sent via the guest posting facility at bioconductor.org.
hi,
There is not really enough information here to give a concrete answer,
but
yes, typically DESeq2 will contain most of the DESeq genes found for a
given adjusted p-value. If this were not the case, I would first check
to
see if the rows of the results tables are not lining up for some
reason.
How many genes are found by either software?
Mike
On Tue, Feb 18, 2014 at 4:36 PM, Shobana Sekar [guest] <
guest@bioconductor.org> wrote:
>
> Hi,
>
> I've been using the DESeq as well as DESeq2 packages for my RNAseq
> analysis. For one particular study alone, I find some discrepancies
between
> DESeq and DESeq2 results. I've read in forums that the gene list is
longer
> when we use DESeq2, which is true in my case. But what I also notice
for
> this study is that there is no overlap between the gene list
obtained using
> DESeq and DESeq2. Could you please help in understanding this
difference?
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
> GenomicRanges_1.14.4
> [5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
[[alternative HTML version deleted]]
Hi Mike,
Thank you for the quick reply. So when I filtered using the padj < 0.1
criterion, DESeq did not give me any results but DESeq2 showed 53
genes. But when I used the pval < 0.05 criterion and filtered, I got
1008 genes in DESeq2 and 222 genes in DESeq, with no overlap between
the 2 lists. Is it because we should be looking at padj rather than
pval?
Thank you once again!
-Shobana
From: Michael Love
<michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>>
Date: Tuesday, February 18, 2014 2:49 PM
To: "Shobana Sekar [guest]"
<guest@bioconductor.org<mailto:guest@bioconductor.org>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>,
Shobana Sekar <ssekar@tgen.org<mailto:ssekar@tgen.org>>
Subject: Re: DESeq2
hi,
There is not really enough information here to give a concrete answer,
but yes, typically DESeq2 will contain most of the DESeq genes found
for a given adjusted p-value. If this were not the case, I would first
check to see if the rows of the results tables are not lining up for
some reason. How many genes are found by either software?
Mike
On Tue, Feb 18, 2014 at 4:36 PM, Shobana Sekar [guest]
<guest@bioconductor.org<mailto:guest@bioconductor.org>> wrote:
Hi,
I've been using the DESeq as well as DESeq2 packages for my RNAseq
analysis. For one particular study alone, I find some discrepancies
between DESeq and DESeq2 results. I've read in forums that the gene
list is longer when we use DESeq2, which is true in my case. But what
I also notice for this study is that there is no overlap between the
gene list obtained using DESeq and DESeq2. Could you please help in
understanding this difference?
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
GenomicRanges_1.14.4
[5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
--
Sent via the guest posting facility at
bioconductor.org<http: bioconductor.org="">.
[[alternative HTML version deleted]]
In general, for looking at sets of genes, you should always use
adjusted
p-value. If you want to be less conservative/get longer lists, you can
raise the threshold to 0.2 etc. This makes more sense than working
with
unadjusted p-values. It is a bit surprising that the intersection of
the
222 and 1008 genes by p-value is 0, as in comparisons I have seen it
is
usually mostly a subset of the DESeq2 calls. You might try working
with
gene names to make sure you haven't accidentally reordered rows.
length( intersection( rownames(DESeq2Res)[which(DESeq2Res$padj <
.33)],
DESeqRes$id[which(DESeqRes$padj < .33)] ) )
If you paste the code you are using for comparison and for DE analysis
that
would help. Also knowing how many samples and conditions do you have.
On Tue, Feb 18, 2014 at 5:00 PM, <ssekar@tgen.org> wrote:
> Hi Mike,
>
> Thank you for the quick reply. So when I filtered using the padj <
0.1
> criterion, DESeq did not give me any results but DESeq2 showed 53
genes.
> But when I used the pval < 0.05 criterion and filtered, I got 1008
genes in
> DESeq2 and 222 genes in DESeq, with no overlap between the 2 lists.
Is it
> because we should be looking at padj rather than pval?
>
> Thank you once again!
>
> -Shobana
>
> From: Michael Love <michaelisaiahlove@gmail.com>
> Date: Tuesday, February 18, 2014 2:49 PM
> To: "Shobana Sekar [guest]" <guest@bioconductor.org>
> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org>,
Shobana
> Sekar <ssekar@tgen.org>
> Subject: Re: DESeq2
>
> hi,
>
> There is not really enough information here to give a concrete
answer,
> but yes, typically DESeq2 will contain most of the DESeq genes found
for a
> given adjusted p-value. If this were not the case, I would first
check to
> see if the rows of the results tables are not lining up for some
reason.
> How many genes are found by either software?
>
> Mike
>
>
> On Tue, Feb 18, 2014 at 4:36 PM, Shobana Sekar [guest] <
> guest@bioconductor.org> wrote:
>
>>
>> Hi,
>>
>> I've been using the DESeq as well as DESeq2 packages for my RNAseq
>> analysis. For one particular study alone, I find some discrepancies
between
>> DESeq and DESeq2 results. I've read in forums that the gene list is
longer
>> when we use DESeq2, which is true in my case. But what I also
notice for
>> this study is that there is no overlap between the gene list
obtained using
>> DESeq and DESeq2. Could you please help in understanding this
difference?
>>
>> -- output of sessionInfo():
>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> base
>>
>> other attached packages:
>> [1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
>> GenomicRanges_1.14.4
>> [5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>
>
[[alternative HTML version deleted]]
hi Shobana,
I'm continuing our discussion where we left off on the Bioconductor
list.
Note that you can save to CSV files straight from R using write.csv().
(some of your columns were exported from Excel as #NAME?)
I used the following code to line up your two results tables:
old <- read.delim(...,header=TRUE)
new <- read.delim(...,header=TRUE)
old$id <- as.character(old$id)
new$id <- as.character(new$id)
rownames(old) <- old$id
rownames(new) <- new$id
old <- old[!is.na(old$pval),]
new <- new[!is.na(new$pvalue),]
common <- intersect(old$id, new$id)
While there were ~200 genes for DESeq with p-value less than .05, all
but 1
of these are not available in the DESeq2 results:
> table(old$pval < .05)
FALSE TRUE
39785 222
> table(old[common,"pval"] < .05)
FALSE TRUE
17129 1
That these genes have NA p-values in the DESeq2 results table is due
to
automatic outlier detection using Cook's distance in DESeq2. You might
see
how the results change if you turn this filtering off by using
results(dds,
cooksCutoff=FALSE).
In the development branch (DESeq2 version >= 1.3), we have implemented
a
better solution to filtering count outliers when there are many
replicates
(7 or more), so genes are not set aside from the analysis (p-values
set to
NA) due to one or more extreme count.
If you look however at the intersection using a higher p-value
threshold
(not recommended for analyses, just for sanity check), the
intersection is
not empty:
table(DESeq = old[common,"pval"] < .2,
DESeq2 = new[common,"pvalue"] < .2)
DESeq2
DESeq FALSE TRUE
FALSE 14166 2878
TRUE 29 57
It is expected that DESeq2 in general calls more DEG than DESeq, which
appeared overly conservative in our testing due to the "maximum" rule
of
estimating dispersion.
Mike
On Tue, Feb 18, 2014 at 5:59 PM, Michael Love
<michaelisaiahlove@gmail.com>wrote:
> In general, for looking at sets of genes, you should always use
adjusted
> p-value. If you want to be less conservative/get longer lists, you
can
> raise the threshold to 0.2 etc. This makes more sense than working
with
> unadjusted p-values. It is a bit surprising that the intersection of
the
> 222 and 1008 genes by p-value is 0, as in comparisons I have seen it
is
> usually mostly a subset of the DESeq2 calls. You might try working
with
> gene names to make sure you haven't accidentally reordered rows.
>
> length( intersection( rownames(DESeq2Res)[which(DESeq2Res$padj <
.33)],
> DESeqRes$id[which(DESeqRes$padj < .33)] ) )
>
> If you paste the code you are using for comparison and for DE
analysis
> that would help. Also knowing how many samples and conditions do
you have.
>
>
> On Tue, Feb 18, 2014 at 5:00 PM, <ssekar@tgen.org> wrote:
>
>> Hi Mike,
>>
>> Thank you for the quick reply. So when I filtered using the padj <
0.1
>> criterion, DESeq did not give me any results but DESeq2 showed 53
genes.
>> But when I used the pval < 0.05 criterion and filtered, I got 1008
genes in
>> DESeq2 and 222 genes in DESeq, with no overlap between the 2 lists.
Is it
>> because we should be looking at padj rather than pval?
>>
>> Thank you once again!
>>
>> -Shobana
>>
>> From: Michael Love <michaelisaiahlove@gmail.com>
>> Date: Tuesday, February 18, 2014 2:49 PM
>> To: "Shobana Sekar [guest]" <guest@bioconductor.org>
>> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org>,
Shobana
>> Sekar <ssekar@tgen.org>
>> Subject: Re: DESeq2
>>
>> hi,
>>
>> There is not really enough information here to give a concrete
answer,
>> but yes, typically DESeq2 will contain most of the DESeq genes
found for a
>> given adjusted p-value. If this were not the case, I would first
check to
>> see if the rows of the results tables are not lining up for some
reason.
>> How many genes are found by either software?
>>
>> Mike
>>
>>
>> On Tue, Feb 18, 2014 at 4:36 PM, Shobana Sekar [guest] <
>> guest@bioconductor.org> wrote:
>>
>>>
>>> Hi,
>>>
>>> I've been using the DESeq as well as DESeq2 packages for my RNAseq
>>> analysis. For one particular study alone, I find some
discrepancies between
>>> DESeq and DESeq2 results. I've read in forums that the gene list
is longer
>>> when we use DESeq2, which is true in my case. But what I also
notice for
>>> this study is that there is no overlap between the gene list
obtained using
>>> DESeq and DESeq2. Could you please help in understanding this
difference?
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils datasets
methods
>>> base
>>>
>>> other attached packages:
>>> [1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
>>> GenomicRanges_1.14.4
>>> [5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
>>>
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>
>>
>
[[alternative HTML version deleted]]
Thank you so much for the detailed answer!
-Shobana
From: Michael Love
<michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>>
Date: Tuesday, February 25, 2014 5:01 PM
To: Shobana Sekar <ssekar@tgen.org<mailto:ssekar@tgen.org>>
Cc: "guest@bioconductor.org<mailto:guest@bioconductor.org>"
<guest@bioconductor.org<mailto:guest@bioconductor.org>>,
"bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>
Subject: Re: DESeq2
hi Shobana,
I'm continuing our discussion where we left off on the Bioconductor
list. Note that you can save to CSV files straight from R using
write.csv(). (some of your columns were exported from Excel as #NAME?)
I used the following code to line up your two results tables:
old <- read.delim(...,header=TRUE)
new <- read.delim(...,header=TRUE)
old$id <- as.character(old$id)
new$id <- as.character(new$id)
rownames(old) <- old$id
rownames(new) <- new$id
old <- old[!is.na<http: is.na="">(old$pval),]
new <- new[!is.na<http: is.na="">(new$pvalue),]
common <- intersect(old$id, new$id)
While there were ~200 genes for DESeq with p-value less than .05, all
but 1 of these are not available in the DESeq2 results:
> table(old$pval < .05)
FALSE TRUE
39785 222
> table(old[common,"pval"] < .05)
FALSE TRUE
17129 1
That these genes have NA p-values in the DESeq2 results table is due
to automatic outlier detection using Cook's distance in DESeq2. You
might see how the results change if you turn this filtering off by
using results(dds, cooksCutoff=FALSE).
In the development branch (DESeq2 version >= 1.3), we have implemented
a better solution to filtering count outliers when there are many
replicates (7 or more), so genes are not set aside from the analysis
(p-values set to NA) due to one or more extreme count.
If you look however at the intersection using a higher p-value
threshold (not recommended for analyses, just for sanity check), the
intersection is not empty:
table(DESeq = old[common,"pval"] < .2,
DESeq2 = new[common,"pvalue"] < .2)
DESeq2
DESeq FALSE TRUE
FALSE 14166 2878
TRUE 29 57
It is expected that DESeq2 in general calls more DEG than DESeq, which
appeared overly conservative in our testing due to the "maximum" rule
of estimating dispersion.
Mike
On Tue, Feb 18, 2014 at 5:59 PM, Michael Love
<michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>>
wrote:
In general, for looking at sets of genes, you should always use
adjusted p-value. If you want to be less conservative/get longer
lists, you can raise the threshold to 0.2 etc. This makes more sense
than working with unadjusted p-values. It is a bit surprising that the
intersection of the 222 and 1008 genes by p-value is 0, as in
comparisons I have seen it is usually mostly a subset of the DESeq2
calls. You might try working with gene names to make sure you haven't
accidentally reordered rows.
length( intersection( rownames(DESeq2Res)[which(DESeq2Res$padj <
.33)], DESeqRes$id[which(DESeqRes$padj < .33)] ) )
If you paste the code you are using for comparison and for DE analysis
that would help. Also knowing how many samples and conditions do you
have.
On Tue, Feb 18, 2014 at 5:00 PM,
<ssekar@tgen.org<mailto:ssekar@tgen.org>> wrote:
Hi Mike,
Thank you for the quick reply. So when I filtered using the padj < 0.1
criterion, DESeq did not give me any results but DESeq2 showed 53
genes. But when I used the pval < 0.05 criterion and filtered, I got
1008 genes in DESeq2 and 222 genes in DESeq, with no overlap between
the 2 lists. Is it because we should be looking at padj rather than
pval?
Thank you once again!
-Shobana
From: Michael Love
<michaelisaiahlove@gmail.com<mailto:michaelisaiahlove@gmail.com>>
Date: Tuesday, February 18, 2014 2:49 PM
To: "Shobana Sekar [guest]"
<guest@bioconductor.org<mailto:guest@bioconductor.org>>
Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>"
<bioconductor@r-project.org<mailto:bioconductor@r-project.org>>,
Shobana Sekar <ssekar@tgen.org<mailto:ssekar@tgen.org>>
Subject: Re: DESeq2
hi,
There is not really enough information here to give a concrete answer,
but yes, typically DESeq2 will contain most of the DESeq genes found
for a given adjusted p-value. If this were not the case, I would first
check to see if the rows of the results tables are not lining up for
some reason. How many genes are found by either software?
Mike
On Tue, Feb 18, 2014 at 4:36 PM, Shobana Sekar [guest]
<guest@bioconductor.org<mailto:guest@bioconductor.org>> wrote:
Hi,
I've been using the DESeq as well as DESeq2 packages for my RNAseq
analysis. For one particular study alone, I find some discrepancies
between DESeq and DESeq2 results. I've read in forums that the gene
list is longer when we use DESeq2, which is true in my case. But what
I also notice for this study is that there is no overlap between the
gene list obtained using DESeq and DESeq2. Could you please help in
understanding this difference?
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq2_1.2.8 RcppArmadillo_0.4.000 Rcpp_0.10.6
GenomicRanges_1.14.4
[5] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0
--
Sent via the guest posting facility at
bioconductor.org<http: bioconductor.org="">.
[[alternative HTML version deleted]]