Question

Difference between DESeqDataSetFromMatrix() function and DESeq() function

0

Entering edit mode

Abir.khazaal ▴ 10

@3e9efee3

Last seen 20 months ago

Australia

Hi, I am currently performing differential expression analysis using DESeq2.

I want to filter out lowly expressed genes, although I read on another post here that this may not be necessary because IndependentFiltering within results() kind of does that. However, I am comparing different approaches for differential expression analysis and I need to follow the same "criteria" kind of.

What I want to know is, what is the difference between

Code should be placed in three backticks as shown below


DESeqDataSetFromMatrix()
# and
DESeq()

I have seen some performing filtering before utilising DESeq() function


dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~ condition) 

keep <- rowSums(counts(dds) >= 10) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
normalizedCounts <- counts(dds, normalized=TRUE)

Whilst the developer utilised DESeq() function and then performed filtering


dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~ condition) 
dds <- DESeq(dds)
dds <- estimateSizeFactors(dds)

# Apply the filtering criteria
idx <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 10
dds <- dds[idx,]

dds <- DESeq(dds)

So I just want to understand which approach is the right one and why :)

Thanks

DESeq DESeq2 • 3.6k views

ADD COMMENT • link updated 21 months ago by ATpoint ★ 4.8k • written 21 months ago by Abir.khazaal ▴ 10

score 0 · Answer 1 · 2023-08-03

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 17 hours ago

Germany

Please follow the manual.

Make the dataset with DESeqDataSetFromMatrix()
Apply prefiltering, either as in https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering or use something like edgeR::filterByExpr, to get what in the vignette is called keep
filter the dataset via dds <- dds[keep,]
run DESeq()
get results via results() or lfcShrink(), see vignette which covers all that

This is estimateSizeFactors() is part of DESeq() so skip that. Filter on raw, nor normalized counts, see vignette.

ADD COMMENT • link 21 months ago ATpoint ★ 4.8k

0

Entering edit mode

Thank you for that @atpoint.

I have seen the steps above in the vignette but got confused when I saw a thread where the developer performed prefiltering using estimateSizeFactors(). Here deseq2 filter the low counts

One question, I didn't quite understand your last sentence. Why should I filter on raw data? doing so will not take into account the differences in library sizes and sequencing depths?! When I performed DE using edgeR, I performed pre-filtering on cpm values. I added my edgeR (pre-filtering) code below

Your help with this is highly appreciated

Thanks


# Prepare raw counts as a DGEList object
dge <- DGEList(counts = countData)

# Obtain CPM values using cpm
cpm_values <- cpm(dge)

# Filter genes that have at least 10 CPM in at least 10 samples
keep <- rowSums(cpm_values > 10) >= 10

# Subset DGEList object to keep only selected genes
dge <- dge[keep, , keep.lib.sizes=FALSE] 

# create a design matrix
design <- model.matrix(~0 +AGE, data=metaData) 

# Estimate common and tagwise dispersions
dge <- estimateDisp(dge, design)

#fit linear model .. etc.

ADD REPLY • link 21 months ago Abir.khazaal ▴ 10

0

Entering edit mode

My advise is to always follow the manual unless you have expert knowledge to do something else. The linked thread is 8 years old, and recommendation by developers change over time. In the edgeR manual it doesn't recommend to filter on cpms, it uses filterByExpr. It is on you to follow to best practices in the manuals or do something custom. Please see the manuals of both edgeR and DESeq2, they contain code suggestions on prefiltering.

ADD REPLY • link 21 months ago ATpoint ★ 4.8k