Hi Guys,
while i run DESeq, I have got some questions.
It looks like there is no way to handle genes with zero read counts in
DESeq
as the returned values are all NA for below two cases
1. totally no information for two groups as like 0 0
2. uniquely expressed genes as like 0 c(constant) or c
0
In DESeq, when M-D plot is generated or de analysis is performed, it
looks
like those genes in M-D plot are all discarded and DESeq returns just
NA
values for such cases. is that correct?
if yes, for genes with uniquely expressed genes, it could be
informative.
isn't it? to my knowledge, DEGseq and edgeR they are doing a simple
way for
such cases. so, there is no NA value in the output even there are
genes with
zero read counts as the input.
Could you please explain how to handle genes with zero read counts in
DESeq
package?
Thanks in advance, --S
[[alternative HTML version deleted]]
Hi Sunghee
On Wed, 28 Jul 2010 14:11:19 -0400, sunghee OH <sshshoh1105 at="" gmail.com="">
wrote:
> It looks like there is no way to handle genes with zero read counts
in
> DESeq
> as the returned values are all NA for below two cases
> 1. totally no information for two groups as like 0 0
> 2. uniquely expressed genes as like 0 c(constant) or
c 0
>
> In DESeq, when M-D plot is generated or de analysis is performed, it
looks
> like those genes in M-D plot are all discarded and DESeq returns
just NA
> values for such cases. is that correct?
No. In your case 2 (some but not all samples have zero counts), DESeq
can
and does calculate a p value for differential expression. Only the log
fold
change estimate is, necessarily, infinity, because you are dividing by
zero.
Only in case 1 (zero counts in _all_ samples that are involved in the
comparison), the p values is NA. This makes sense because if you do
not
observe anything from a gene you cannot say anything about it.
> if yes, for genes with uniquely expressed genes, it could be
informative.
> isn't it? to my knowledge, DEGseq and edgeR they are doing a simple
way
> for
> such cases. so, there is no NA value in the output even there are
genes
> with
> zero read counts as the input.
To my knowledge, edgeR treats zero counts in the same way as DESeq.
(It
used to skip rows with all zero counts but now leaves them in and puts
NA.)
> Could you please explain how to handle genes with zero read counts
in
DESeq
> package?
If you really see NA even if only some counts are zero you have found
a
bug. Please send details in this case. (However, you are not the first
one
to ask, and, so far, people had just not looked properly and confusesd
the
p value column with the log fold change column in the results data
frame.)
Cheers
Simon