Question

[ChIP-seq] DESeq2 on counts matrix from union set of called peaks

0

Entering edit mode

Luca • 0

@79698011

Last seen 11 minutes ago

Canada

To preface this,

I have used DiffBind and csaw for most of my experimental work and love these programs, this is just a general question I had and I have seen versions of these questions asked before on this forum - but would like to ask additional questions namely regarding the normalization procedures.

(see: DEseq2 for differential binding analysis ChIP seq and DESeq2 for ChIP-seq differential peaks)

To summarize, I wanted to try a different approach, rather than using Csaw or DiffBind just for my own knowledge and abilities. To avoid posting lines of code, I'll simply summarize but can edit this in the future: I create a merged bam per condition of all replicates using samtools and called peaks with macs3 off this. I generated a union set of peaks between these two conditions using bedtools sort & merge, followed by converting this into a .saf file, and called featureCounts off the individual replicate Bams using this .saf file, which generates a count matrix of the reads within these peaks per replicate.

I preprocess this count matrix to filter out blacklisted regions as well as low counts in majority of the replicates as follows

Olaps <- overlapsAny(CoordinatesOnly %>% GRanges(), blacklist)
FilteredCounts <- RawCounts[ !Olaps, ]

FilteredCounts <- FilteredCounts [apply(FilteredCounts , 1, function(row) length(row[row>5]) > 2),]

I then import this count matrix and proceed as normal using DESeq2 (that is using the default normalization)

DDS <- estimateSizeFactors(DDS)
DDS <- DESeq(DDS, test = "Wald")

Running a Wald test in DESeq2 on this yields results almost identical to those I get with csaw and diffbind, the general trend in sites is the same and the sites are almost all the same as well.

My question is simple, is this a valid approach to this problem? Or should more time be spent on normalization. In the Csaw paper, DiffBind paper, and the Csaw book there is a lot of emphasis placed on normalization strategies. For this ChIP we are not dealing with a WT vs KO condition so global shifts may not be factor but rather a disease and how this particular protein changes its binding with it. How would one deal with global shifts in an approach like this? Should I trust these results or should I be more cautious?

Thanks in advance!

csaw ChIPseq DESeq2 DiffBind • 103 views

ADD COMMENT • link updated 5 hours ago by ATpoint ★ 4.7k • written 19 hours ago by Luca • 0

score 0 · Answer 1 · 2025-03-12

Yes, perfectly valid. Analyzing counts from high-throughput experiments with DESeq2 (or edgeR, or limma) is standard. After all, what DiffBind mainly brings to the table is automatization, filtering, counting and visualization routines. The actual testing is done by either DESeq2 or edgeR. You can run your analysis without any wrapper package. This is not unusual.