Question

Pre-filtering results influence on downstream analysis

0

Entering edit mode

andrebolerbarros ▴ 20

@andrebolerbarros-16788

Last seen 6 months ago

Portugal

Hello everyone,

I am currently working on RNA-Seq data using DESeq2. As it is in the manual, you can perform pre-filtering (e.g.:

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

However, it's also said that: "While it is not necessary to pre-filter low count genes before running the DESeq2 functions...". So, from what I gather, using this threshold (10) or just removing genes w/ zero counts would yield the same result.

In my results, I used both criteria and, although the summary output is the same, I get different p-values (non-corrected and after BH adjustment).

dm <- DESeqDataSetFromMatrix(countData = tab, colData = design, design = ~ group)
dm<-dm[rowSums(counts(dm)) > 0 , ]
dm<-DESeq(dm)

ashr_zero<-lfcShrink(dm,contrast=c("group","trt","untrt"),type="ashr")

dm <- DESeqDataSetFromMatrix(countData = tab, colData = design, design = ~ group)
dm<-dm[rowSums(counts(dm)) > 10 , ]
dm<-DESeq(dm)

ashr_ten<-lfcShrink(dm,contrast=c("group","trt","untrt"),type="ashr")

ashr_zero<-ashr_zero[rownames(ashr_zero) %in% rownames(ashr_ten),]


all(rownames(ashr_zero)==rownames(ashr_ten)) #to check if I'm comparing the same genes

[1] TRUE

check1<-vector()

for (i in 1:ncol(ashr_res1)) {
  check1[i]<-all(ashr_zero[,i] == ashr_ten[,i],na.rm=T)
}

check1

[1]  TRUE FALSE FALSE FALSE FALSE

By looking at the summary, the independent filtering criteria is the same, the number of genes is different (which is normal, considering I filter more genes in the threshold 10 than for zero) but, I really don't understand what is causing this difference.

Thanks!

deseq2 rnaseq • 889 views

ADD COMMENT • link updated 6.0 years ago by Michael Love 42k • written 6.0 years ago by andrebolerbarros ▴ 20

score 0 · Answer 1 · 2018-11-19

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 10 hours ago

United States

It should not yield an identical result. The low count genes will have some influence on the parameters of the dispersion function.

ADD COMMENT • link 6.0 years ago Michael Love 42k

0

Entering edit mode

It's what I suspected, thanks! Then, what criteria for pre-filtering should I use?

ADD REPLY • link 6.0 years ago andrebolerbarros ▴ 20

0

Entering edit mode

It doesn’t really matter, except that once you pick one filtering rule you should note it down and stick with it for computational reproducibility.

ADD REPLY • link 6.0 years ago Michael Love 42k