Question

BaseMean threshold

0

Entering edit mode

msellabenton • 0

@c2e5df93

Last seen 3.0 years ago

United Kingdom

I have an rna seq dataset and I am using Deseq2 to find differentially expressed genes between the two groups. However, I also want to remove genes in low counts by using a base mean threshold. I used pre-filtering to remove any genes that have no counts or only one count across the samples, however, I also want to remove those that have low counts compared to the rest of the genes. Is there a common threshold used for the basemean or a way to work out what this threshold should be?

Thank you

basemean DESeq2 • 6.3k views

ADD COMMENT • link updated 2.6 years ago by ATpoint ★ 4.8k • written 3.0 years ago by msellabenton • 0

score 0 · Answer 1 · 2022-04-27

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 day ago

Germany

I would not use the baseMean for any filtering as it is (at least to me) hard to deconvolute. You do not know why the baseMean is low, either because there is no difference between groups and the gene is just lowly-expressed (and/or short), or it is moderately expressed in one but off in the other. The baseMean could be the same in these two scenarios. If you filter I would do it on the counts. So you could say that all or a fraction of samples of at least one group must have 10 or more counts. That will ensure that you remove genes that have many low counts or zeros across the groups rather than nested by group, the latter would be a good DE candidate so it should not be removed. Or you do that automated, e.g. using the edgeR function filterByExpr.

ADD COMMENT • link 3.0 years ago ATpoint ★ 4.8k

1

Entering edit mode

Yes, agree you can use filterByExpr or I commonly just use something like:

keep <- rowSums(counts(dds) >= 10) >= x

where x is the minimal number of samples that should have a count of 10 or more. E.g. you can use the smallest group sample size.

ADD REPLY • link 3.0 years ago Michael Love 43k

0

Entering edit mode

Would you argue that this is could/should also be applied to a DEXSeq dataset , where you want to identify differentially expressed exons.

ADD REPLY • link 2.6 years ago osieman52 • 0

0

Entering edit mode

Yes, it is even noted in the vignette that prefiltering might make sense.

ADD REPLY • link 2.6 years ago ATpoint ★ 4.8k

0

Entering edit mode

Besides from speed will it make any difference if i apply this filtering after selecting differential exons based on e.g padj < 0.01 & a log2fc >=2 and <= -2 ?

ADD REPLY • link 2.6 years ago osieman52 • 0

0

Entering edit mode

Filtering is not done for speed reasons, it is to both increase precision of the model parameter estimation and to reduce the multiple testing burden. It only makes sense to me if done before running the statistical testing.

ADD REPLY • link 2.6 years ago ATpoint ★ 4.8k