Question

A question on DESeq

0

Entering edit mode

sunghee OH ▴ 50

@sunghee-oh-4019

Last seen 10.6 years ago

Hi Guys, while i run DESeq, I have got some questions. It looks like there is no way to handle genes with zero read counts in DESeq as the returned values are all NA for below two cases 1. totally no information for two groups as like 0 0 2. uniquely expressed genes as like 0 c(constant) or c 0 In DESeq, when M-D plot is generated or de analysis is performed, it looks like those genes in M-D plot are all discarded and DESeq returns just NA values for such cases. is that correct? if yes, for genes with uniquely expressed genes, it could be informative. isn't it? to my knowledge, DEGseq and edgeR they are doing a simple way for such cases. so, there is no NA value in the output even there are genes with zero read counts as the input. Could you please explain how to handle genes with zero read counts in DESeq package? Thanks in advance, --S [[alternative HTML version deleted]]

edgeR DEGseq DESeq edgeR DEGseq DESeq • 1.2k views

ADD COMMENT • link updated 14.7 years ago by Simon Anders ★ 3.8k • written 14.7 years ago by sunghee OH ▴ 50

score 0 · Answer 1 · 2010-07-28

Hi Sunghee On Wed, 28 Jul 2010 14:11:19 -0400, sunghee OH <sshshoh1105 at="" gmail.com=""> wrote: > It looks like there is no way to handle genes with zero read counts in > DESeq > as the returned values are all NA for below two cases > 1. totally no information for two groups as like 0 0 > 2. uniquely expressed genes as like 0 c(constant) or c 0 > > In DESeq, when M-D plot is generated or de analysis is performed, it looks > like those genes in M-D plot are all discarded and DESeq returns just NA > values for such cases. is that correct? No. In your case 2 (some but not all samples have zero counts), DESeq can and does calculate a p value for differential expression. Only the log fold change estimate is, necessarily, infinity, because you are dividing by zero. Only in case 1 (zero counts in _all_ samples that are involved in the comparison), the p values is NA. This makes sense because if you do not observe anything from a gene you cannot say anything about it. > if yes, for genes with uniquely expressed genes, it could be informative. > isn't it? to my knowledge, DEGseq and edgeR they are doing a simple way > for > such cases. so, there is no NA value in the output even there are genes > with > zero read counts as the input. To my knowledge, edgeR treats zero counts in the same way as DESeq. (It used to skip rows with all zero counts but now leaves them in and puts NA.) > Could you please explain how to handle genes with zero read counts in DESeq > package? If you really see NA even if only some counts are zero you have found a bug. Please send details in this case. (However, you are not the first one to ask, and, so far, people had just not looked properly and confusesd the p value column with the log fold change column in the results data frame.) Cheers Simon