Question

Data filtering in DESeq2

0

Entering edit mode

sally.badawi • 0

@sallybadawi-16174

Last seen 5.8 years ago

Hello again,

Actually, we know that you havent recommended data filtration before running DESeq function, claiming that it only affects the speed of the function running. Interestingly, when setting a filtration strategy based on the percentage of samples having zero read counts in our data, we found that indeed the homogeneity of the data, the distribution and the normalization have been improved. The relation between the removed genes post-filtration proceeded for analysis and the number of DEG obtained wasnt linear, a peak of DEG was obtained post 71% filtration and then decreased. We see that this strategy has at least removed the experimental error coming from the low count genes that are at the threshold of detection in mRNA-seq. I would like to know what do you suggest and how can we explain these results, Is it really better to proceed without filtering the data?

Thank you

deseq2 data filtering • 836 views

ADD COMMENT • link updated 5.8 years ago by Michael Love 43k • written 5.8 years ago by sally.badawi • 0

score 0 · Answer 1 · 2019-02-15

It depends on the data of course. I never claimed the filtering only affects the speed, but I said that this and the reduced memory size of the object make some pre-filtering useful for most datasets.

If you have found that pre-filtering provides better results on your dataset, that is of course fine to perform before running DESeq().