Hi,
I know that DEseq2 can do low counts filtering but I wanted to set my threshold based on CPM values rather than count values then do filtering. Is there a way in DEseq2?
Thanks for the help.
J.
Hi,
I know that DEseq2 can do low counts filtering but I wanted to set my threshold based on CPM values rather than count values then do filtering. Is there a way in DEseq2?
Thanks for the help.
J.
First you can turn off independent filtering like so:
results(dds, independentFiltering=FALSE)
Then you can choose your own threshold and adjust the p-values yourself:
filterGenes <- res$baseMean < x res$pvalue[ filterGenes ] <- NA res$padj <- p.adjust( res$pvalue, method="BH" )
The one caution is that, if the groups are very unbalanced, say 100 vs 10, then you need to be careful not to remove genes which have expression only in the smaller group. These could be truly DE, but could have a small baseMean = mean of normalized counts. The independent filtering procedure takes care of this in a data-driven way, by guaranteeing an increase in sensitivity.
Another filter is to say, there should be some minimal amount of counts in some minimal number of samples:
filterGenes <- rowSums( counts(dds, normalized=TRUE) > x ) < y
This says, filter out gene where there are less than y samples with normalized count more than x.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I think you probably meant:
Or flip the
> x
to a< x
...Oops thanks. I'll update my code.
Thank you Steve. I will use fpm instead. Can I say dds[rowSums(fpm(dds)>1)>=2] in DESeq2 is similar to rowSums(cpm(counts)>1)>=2 in EdgeR filtering?
Yes these should be similar. You can do:
to see the cross-table.
Hi Michael,
Thank you so much.
J.
hi John, again, a minor note about the support site: you can use the blue ADD REPLY and ADD COMMENTS buttons if you want to reply to an existing answer.
If you were answering someone else's post you would use the green Add Answer button.
It's a bit easier to follow the discussion if you use the comment/reply buttons because this lead to threaded discussions, where one can see what answer you are replying to (the answers will get reordered if someone clicks the thumbs up button and then it would be hard to follow the order of the discussion).
Thanks. I will do that :-)
J.
Thank you Michael. You are still using the read counts for filtering, right? I was looking for solution like rowSums(cpm(counts)>1)>=2. Does DESeq2 calculates cpm? I couldn't find in documentation. I can see that EdgeR does this.
Thank you for the help.
J.
It's not the raw read counts, but the normalized read counts that he's using. Note that he is calling:
This isn't normalzed to "per million", though. For that look at
?fpm