Use csaw to build matrix for heatmap?
1
0
Entering edit mode
jmKeith • 0
@jmkeith-11960
Last seen 5.6 years ago

Hello,

Can anyone help me understand how to use csaw to 1) filter my BAM file reads to retain only windows within MACS called peaks, 2) then count the number of reads in 100bp bins within a 5kbp region around each peak?  

I have my BAM reads file, my BED peaks file and then an additional BED file which has 5kbp regions around each peak.  

I've tried to filter my reads using the peaks as follows-

counts <- windowCounts(p300.bam, width=10, ext=frag.len, param=param)
> locality <- regionCounts(p300.bam, resize(rowRanges(counts), fix="center", 100))
Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on
  sequences chr11, chr18, chr5, and chrX. Note that only
  ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths()
  and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim
  these ranges. See ?`trim,GenomicRanges-method` for more
  information.
2: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 4 out-of-bound ranges located on
  sequences chr11, chr18, chr5, and chrX. Note that only
  ranges located on a non-circular sequence whose length is
  not NA can be considered out-of-bound (use seqlengths()
  and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim
  these ranges. See ?`trim,GenomicRanges-method` for more
  information.
> stats <- filterWindows(counts, locality, type="local")
Error in .checkLibSizes(data, background) : 
  data and background totals should be identical

Many thanks for any help,

Julia

csaw chipseq edger • 1.8k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

There's a couple of problems here. The first and immediate problem is that you've set param in your call to windowCounts but not in regionCounts. This leads to the error that you've observed upon calling filterWindows because you're not counting the same reads from each BAM file.

The bigger issue is that your code is not matching up to what you say you want to do. Currently, your code is selecting windows (in counts) that are enriched compared to the local background (in locality). This mimics a similar step in MACS where locally enriched sliding windows are identified. However, csaw does not do any peak calling. If you want to call peak intervals... well, use the peak intervals that you get out of MACS. (There's also another problem in that the interval of your local background is far too small - you should be using 1-5 kbp regions around each window to get a stable estimate of the neighbouring enrichment.)

So, let's revisit what you actually want to do. To deal with task 1, you should load the peaks from your first BED file with rtracklayer. You can then use findOverlaps to identify the windows in counts that lie within the called peak intervals. Task 2 might be more work depending on how correctly you want to do it. I would start from the summit of each peak and do something like:

expanded.peak <- resize(peak.summit, width=1e4, fix="center")
all.bins <- tile(expanded.peak, width=100)
all.bins <- unlist(all.bins)

... and use all.bins in regionCounts to get the count for each bin. You could then do something like:

heat.mat <- matrix(assay(bin.counts), nrow=length(expanded.peak), byrow=TRUE)

... to get a matrix for heatmap plotting (assuming bin.counts is the object you get out of regionCounts).

ADD COMMENT
0
Entering edit mode

Thank you very much for your help.  I understand now how to build the matrices, but I'm still a bit lost on how to filter for reads that fall within peaks.  Would you mind going through that part one more time?

Very much appreciated,

Julia

ADD REPLY
1
Entering edit mode

Filter for reads, or filter for windows? If you want to identify individual reads that fall into peaks, then you need to construct a GRanges object from the read intervals (e.g., using extractReads) and identify overlaps with the peak intervals using findOverlaps. If you want to identify windows that fall into peaks, then you can identify overlaps between the window intervals (from rowRanges) and the peak intervals with findOverlaps. Read the documentation for the findOverlaps function if you're still not sure - it's a very useful utility so it's worth understanding.

ADD REPLY

Login before adding your answer.

Traffic: 847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6