CPM filtering in deseq2
1
1
Entering edit mode
John ▴ 30
@john-9676
Last seen 6.5 years ago

Hi,

I know that DEseq2 can do low counts filtering but I wanted to set my threshold based on CPM values rather than count values then do filtering. Is there a way in DEseq2?

Thanks for the help.

J.

deseq2 • 7.7k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

First you can turn off independent filtering like so:

results(dds, independentFiltering=FALSE)

Then you can choose your own threshold and adjust the p-values yourself:

filterGenes <- res$baseMean < x

res$pvalue[ filterGenes ] <- NA

res$padj <- p.adjust( res$pvalue, method="BH" )

The one caution is that, if the groups are very unbalanced, say 100 vs 10, then you need to be careful not to remove genes which have expression only in the smaller group. These could be truly DE, but could have a small baseMean = mean of normalized counts. The independent filtering procedure takes care of this in a data-driven way, by guaranteeing an increase in sensitivity.

Another filter is to say, there should be some minimal amount of counts in some minimal number of samples:

filterGenes <- rowSums( counts(dds, normalized=TRUE) > x ) < y

This says, filter out gene where there are less than y samples with normalized count more than x.

ADD COMMENT
1
Entering edit mode

I think you probably meant:

res$pvalue[ !filterGenes ] <- NA

Or flip the > x to a < x ...

ADD REPLY
0
Entering edit mode

Oops thanks. I'll update my code.

ADD REPLY
0
Entering edit mode

Thank you Steve. I will use fpm instead. Can I say dds[rowSums(fpm(dds)>1)>=2] in DESeq2 is similar to rowSums(cpm(counts)>1)>=2 in EdgeR filtering? 

ADD REPLY
1
Entering edit mode

Yes these should be similar. You can do:

table(filter1, filter2)

to see the cross-table.

ADD REPLY
0
Entering edit mode

Hi Michael,

Thank you so much.

J.

ADD REPLY
0
Entering edit mode

hi John, again, a minor note about the support site: you can use the blue ADD REPLY and ADD COMMENTS buttons if you want to reply to an existing answer.

If you were answering someone else's post you would use the green Add Answer button. 

It's a bit easier to follow the discussion if you use the comment/reply buttons because this lead to threaded discussions, where one can see what answer you are replying to (the answers will get reordered if someone clicks the thumbs up button and then it would be hard to follow the order of the discussion). 

ADD REPLY
0
Entering edit mode

Thanks. I will do that :-)

J.

ADD REPLY
0
Entering edit mode

Thank you Michael. You are still using the read counts for filtering, right? I was looking for solution like rowSums(cpm(counts)>1)>=2. Does DESeq2 calculates cpm? I couldn't find in documentation. I can see that EdgeR does this. 

Thank you for the help.

 J.

ADD REPLY
1
Entering edit mode

It's not the raw read counts, but the normalized read counts that he's using. Note that he is calling:

counts(dds, normalized=TRUE)

This isn't normalzed to "per million", though. For that look at ?fpm

ADD REPLY

Login before adding your answer.

Traffic: 656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6