Hi, I am currently performing differential expression analysis using DESeq2.
I want to filter out lowly expressed genes, although I read on another post here that this may not be necessary because IndependentFiltering within results() kind of does that. However, I am comparing different approaches for differential expression analysis and I need to follow the same "criteria" kind of.
What I want to know is, what is the difference between
Code should be placed in three backticks as shown below
DESeqDataSetFromMatrix()
# and
DESeq()
I have seen some performing filtering before utilising DESeq() function
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = metaData,
design = ~ condition)
keep <- rowSums(counts(dds) >= 10) >= 10
dds <- dds[keep,]
dds <- DESeq(dds)
normalizedCounts <- counts(dds, normalized=TRUE)
Whilst the developer utilised DESeq() function and then performed filtering
dds <- DESeqDataSetFromMatrix(countData = countData,
colData = metaData,
design = ~ condition)
dds <- DESeq(dds)
dds <- estimateSizeFactors(dds)
# Apply the filtering criteria
idx <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 10
dds <- dds[idx,]
dds <- DESeq(dds)
So I just want to understand which approach is the right one and why :)
Thanks
Thank you for that @atpoint.
I have seen the steps above in the vignette but got confused when I saw a thread where the developer performed prefiltering using estimateSizeFactors(). Here deseq2 filter the low counts
One question, I didn't quite understand your last sentence. Why should I filter on raw data? doing so will not take into account the differences in library sizes and sequencing depths?! When I performed DE using edgeR, I performed pre-filtering on cpm values. I added my edgeR (pre-filtering) code below
Your help with this is highly appreciated
Thanks
My advise is to always follow the manual unless you have expert knowledge to do something else. The linked thread is 8 years old, and recommendation by developers change over time. In the edgeR manual it doesn't recommend to filter on cpms, it uses filterByExpr. It is on you to follow to best practices in the manuals or do something custom. Please see the manuals of both edgeR and DESeq2, they contain code suggestions on prefiltering.