BaseMean threshold
1
0
Entering edit mode
@c2e5df93
Last seen 2.6 years ago
United Kingdom

I have an rna seq dataset and I am using Deseq2 to find differentially expressed genes between the two groups. However, I also want to remove genes in low counts by using a base mean threshold. I used pre-filtering to remove any genes that have no counts or only one count across the samples, however, I also want to remove those that have low counts compared to the rest of the genes. Is there a common threshold used for the basemean or a way to work out what this threshold should be?

Thank you

basemean DESeq2 • 5.3k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 4 hours ago
Germany

I would not use the baseMean for any filtering as it is (at least to me) hard to deconvolute. You do not know why the baseMean is low, either because there is no difference between groups and the gene is just lowly-expressed (and/or short), or it is moderately expressed in one but off in the other. The baseMean could be the same in these two scenarios. If you filter I would do it on the counts. So you could say that all or a fraction of samples of at least one group must have 10 or more counts. That will ensure that you remove genes that have many low counts or zeros across the groups rather than nested by group, the latter would be a good DE candidate so it should not be removed. Or you do that automated, e.g. using the edgeR function filterByExpr.

ADD COMMENT
1
Entering edit mode

Yes, agree you can use filterByExpr or I commonly just use something like:

keep <- rowSums(counts(dds) >= 10) >= x

where x is the minimal number of samples that should have a count of 10 or more. E.g. you can use the smallest group sample size.

ADD REPLY
0
Entering edit mode

Would you argue that this is could/should also be applied to a DEXSeq dataset , where you want to identify differentially expressed exons.

ADD REPLY
0
Entering edit mode

Yes, it is even noted in the vignette that prefiltering might make sense.

ADD REPLY
0
Entering edit mode

Besides from speed will it make any difference if i apply this filtering after selecting differential exons based on e.g padj < 0.01 & a log2fc >=2 and <= -2 ?

ADD REPLY
0
Entering edit mode

Filtering is not done for speed reasons, it is to both increase precision of the model parameter estimation and to reduce the multiple testing burden. It only makes sense to me if done before running the statistical testing.

ADD REPLY

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6