I have used DiffBind to come up with some differentially bound/occupied sites in chromatin accessibility data and for some downstream purposes I wanted to take those lists and export as BED files with their standard scores - for example to use in a PWMtool and other areas. I was wondering what the most straightforward way was to accomplish this. Thank you!
I'm not sure what you mean by "standard scores." Do you mean a score generated by the original peak caller [eg -10 log10(pvalue)]? Or a statistic relating to the differential binding itself (ie FDR, fold change, mean concentration, etc)?
If you want to retrieve scores generated by the peak caller, there are a couple of complications. One issue is that the consensus peaks do not map directly to called peaks. The peak intervals may be expanded during the peak merging process, and not all peaks are called for all samples (so there is no score from the peak caller). Another issue is that DiffBind normalizes the original peak scores to a 0..1 range, so the actual original scores are no longer available.
Let me know if you are trying to retrieve the peak caller scores, as I could give you some sample code. The basic approach is to read the original peak calls with their scores into a GRanges object, and then overlap each one with the GRanges object returned by dba.report(). You can then use the original scores from peaks that overlap the significantly differential bound regions.
Appreciate the quick response. So I wanted to use this tool:http://ccg.vital-it.ch/pwmtools/pwmeval.php which asks for a score which I assumed to be the original peak caller score you mention however if there is a proxy for it within the DiffBind output I guess that would work also. I will gladly try the sample code though if thats my best path to an answer. Thank you.
This tool uses the score to rank sites. If you are using differentially bound sites from DiffBind, a score from DiffBind would be the best bet. For example:
That works great thank you! One final question, if I only wanted to export a BED with one one set of differential peaks lets say in the minus direction above -2 fold or in the plus above 2 fold how would I go about getting independent BED files for that? Thanks again.
Had a follow-up question on this workflow for exporting to a BED file. How can I go about including strand information of the differentially called peaks in the exported BED file? Thank you!
As far as standard ChIP-seq data goes, the protocol is not strand-specific -- there is no way to know where on the pulled-down DNA, or which strand, the protein of interest was binding. Peaks form peak callers generally do not have a strand specified, the entire genomic interval (both strands) comprises the peaks. Sometimes a motif analysis can tell you which strand was probably bound but that is beyond what DiffBind does.
ChIP-seq peaks are inherently not stranded, so there is no meaningful strand information to include. All you can tell is that there is binding at this location. There are (unless the protocol has been radically changed) top-strand reads upstream of the actual binding site, and bottom-strand reads downstream, just reflecting that we're reading more or less at random from either end of the fragment that was immunoprecipitated.
Hi Rory,
Appreciate the quick response. So I wanted to use this tool:http://ccg.vital-it.ch/pwmtools/pwmeval.php which asks for a score which I assumed to be the original peak caller score you mention however if there is a proxy for it within the DiffBind output I guess that would work also. I will gladly try the sample code though if thats my best path to an answer. Thank you.
Rob.
This tool uses the score to rank sites. If you are using differentially bound sites from
DiffBind
, a score fromDiffBind
would be the best bet. For example:Hi Rory,
That works great thank you! One final question, if I only wanted to export a BED with one one set of differential peaks lets say in the minus direction above -2 fold or in the plus above 2 fold how would I go about getting independent BED files for that? Thanks again.
Rob.
Thank you very much!
Hi Rory,
Had a follow-up question on this workflow for exporting to a BED file. How can I go about including strand information of the differentially called peaks in the exported BED file? Thank you!
Rob.
As far as standard ChIP-seq data goes, the protocol is not strand-specific -- there is no way to know where on the pulled-down DNA, or which strand, the protein of interest was binding. Peaks form peak callers generally do not have a strand specified, the entire genomic interval (both strands) comprises the peaks. Sometimes a motif analysis can tell you which strand was probably bound but that is beyond what DiffBind does.
-Rory
Hi,
ChIP-seq peaks are inherently not stranded, so there is no meaningful strand information to include. All you can tell is that there is binding at this location. There are (unless the protocol has been radically changed) top-strand reads upstream of the actual binding site, and bottom-strand reads downstream, just reflecting that we're reading more or less at random from either end of the fragment that was immunoprecipitated.
Edited to add: yeah, what Rory said... ;)