Hello,
I came across a paper recently that divided their analysis between the TSS-proximal region and some other distal sites (I believe they used DiffBind with called peaks, and just filtered their peaks based on the peak annotations). I was wondering if a similar approach is possible using Csaw and generally sliding window methods.
Here is what I've tried, I was wondering if this is a sound approach, it seems to work.
#Get TSS proximal region of transcripts
promoters_mm10 <- promoters(TxDb.Mmusculus.UCSC.mm10.knownGene, upstream = 1000, downstream = 1000)
#Get length of each chr/contig for respective genome (mm10 in this case)
genome_mm10 <- seqlengths(BSgenome.Mmusculus.UCSC.mm10)
genome_ranges <- GRanges(seqnames = names(genome_mm10), ranges = IRanges(start = 1, end = genome_mm10))
#Subset the regions that do not fall within the promoter +-1kb
promoters_mm10_reduce <- reduce(promoters_mm10)
nonTSS <- setdiff(genome_gr, promoters_mm10_reduce, ignore.strand = T)
#Add this to the discard argument in readParam, and restrict to only the chr's
restriction_TSS_pe <- readParam(
pe = "both", minq = 20, dedup = T, discard = nonTSS,
restrict = paste0("chr", c(1:19, "X", "Y")), max.frag = 400
)
And then as per the book, apply a windowCounts and regionCounts as follows, with a local filter for low FC windows
WindowCounts_Males_TSSOnly <- windowCounts(
Updated_Male_4Months$bamReads, ext = Frag.LengthsM_IP.max,
width = 150, param = restriction_TSS_pe, BPPARAM = SnowParam(workers = 20)
)
TSSLocal <- resize(rowRanges(WindowCounts_Males_TSSOnly), 1000, fix = "center")
Window_Males_TSSOnly <- regionCounts(
Updated_Male_4Months$bamReads, regions = TSSLocal, ext = Frag.LengthsM_IP.max,
param = restriction_TSS_pe, BPPARAM = SnowParam(workers = 20)
)
filter.stat.TSS <- filterWindowsLocal(WindowCounts_Males_TSSOnly, Window_Males_TSSOnly)
WindowCounts_Males_TSSOnly <- WindowCounts_Males_TSSOnly[ filter.stat.TSS$filter > log2(3), ]
In this case I applied a composition bias normalization since I'm expecting a systematic change in binding, but generally does this approach make sense? Would it make sense if I took the complement set of the TSS regions (i.e.: all regions but the TSS)? And lastly, since we're interested in what's going on at enhancer sites, would it make sense to use peaks called from a separate H3K27ac ChIP in a readParam() setup like this?