Pre-filtering genes counts vs normalised counts DESeq2
1
0
Entering edit mode
bekah ▴ 40
@bekah-12633
Last seen 5.8 years ago

Hi,

​I am working on RNA-seq data and looking to prefilter my data - I was using the sum of the read counts >100, but then read that I could use the normalised read counts instead? Is this a better filter, as it is based on data normalised across all samples? I am struggling to be able to view the data after running dss <- estimateSizeFactors(dss) in order to choose a suitable threshold?

Best wishes,

​Rebekah

deseq2 pre-filtering • 6.0k views
ADD COMMENT
0
Entering edit mode

Oh sorry I think I found it

counts(dss, normalized =TRUE)

​just in case anyone else was looking for this also

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Hi Bekah, 

Yes you can use the normalized counts for pre-filtering.

ADD COMMENT
0
Entering edit mode

Hi Michael,

I have now read several posts on pre-filtering and have confused myself.
I understand that pre-filtering isn't necessary when using DESeq2 due to the filtering step that occurs in the DESeq function.

I have the script for comparing across 20samples for removing the genes with v low counts:

dss<-DESeqDataSetFromMatrix(countData = countsall, colData = samplesall, design =~condition)
colnames(dss)<-colnames(countsall)
dss<- dss[rowSums(counts(dsall))>10,]
dss<-DESeq(dss)

Is this a valid filter to be using? I have seen many posts where instead the filter is applied after running DESeq, but doesn't this mean that low counts are then still included?

Rebekah

ADD REPLY
0
Entering edit mode

hi,

That pre-filter is fine.

You could also do:

keep <- rowSums(counts(dds) >= x) >= y

where x and y are meaningful for your data, e.g. x may be a count around 5, and y may be the smallest group size. But our LFC shrinkage methods and the fitting going on inside DESeq() don't technically require filtering.

I wouldn't manually filter after DESeq(). results() does an optimal filter for power when you call it, using either of two published methods (genefilter or IHW). The results() filtering can be turned off with independentFiltering=FALSE.

ADD REPLY
0
Entering edit mode

Cheers for clearing up my confusion!

ADD REPLY
0
Entering edit mode

Hi Michael,

If I filter out a readcount of less than 50 for row sums

dst27<- dst27[rowSums(counts(dst27))>=50,]

I get a slightly higher number of DEG with padj<0.05 than when filtering with dst27<- dst27[rowSums(counts(dst27))>10,]


Is this still a valid filter or am I undermining the assumptions on which DESeq2 runs by applying a filter of rowsums 50 before passing the data through the package?

Best wishes,

Rebekah

ADD REPLY
2
Entering edit mode

You can filter at whatever mean count you want, this doesn't disturb the statistical assumptions.

Remember, if you pre-filter too high, you could remove rows which look like: [0, 0, 0] vs [high, high, high].

ADD REPLY

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6