Filtering after DESeq
0
0
Entering edit mode
@3f9f9566
Last seen 58 minutes ago
Germany

We have performed RNAseq on a lot of samples of individual flies (~ 20 per condition). After running DESeq and then checking various contrasts (with ihw=TRUE), I find myself with quite a bit of genes that are detected as differentially expressed between 2 conditions. But when I plot their counts, I notice that for a lot of them, it comes from the fact that most individuals have 0 counts while 1 or 2 have a high counts, which ends up being responsible for the significance of the Wald test.

I have read that pre-filtering was recommended, something along the lines of

keep <- rowSums(counts(dds) >= x) >= y

However if you look at the example I am joining to this post of 1 gene : if I filtered out this gene based on the fact that it has more than 0 counts in only 10 individuals, I take the risk of filtering a gene which would have more than 0 counts in 10 individuals of one condition. This would represent half of the individuals of the given condition, which I would consider meaningful. There are a few of these genes in my dataset too.

What could I do ?

Thank you !

DESeq2 IHW • 216 views
ADD COMMENT
1
Entering edit mode

The problem is that your prefilter is not group-aware so the behaviour of seeing these sorts of outliers is expected. This is why I always recommend filterByExpr from edgeR (returning a vector of genes to keep which you can apply to your DESeqDataSet) as this does group-aware filtering, thereby ensuring that retained genes have sufficiently large counts to facilitate a groupwise comparison avoiding the sitatuon you see.

ADD REPLY
0
Entering edit mode

I am going to try this right now, thanks !

ADD REPLY

Login before adding your answer.

Traffic: 868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6