Inconsistent peak width with DiffBind "summit" function
1
0
Entering edit mode
meisan406 • 0
@meisan406-7061
Last seen 7.1 years ago
United States

This is a question on the "summit" function in DiffBind.

I would like to generate a set of consensus peaks with width of 201bp, centered around the peak summit. The codes I used were:

dba.object <- dba( sampleSheet = metadata, peakCaller="bed", scoreCol=5, bRemoveM=TRUE, bLowerScoreBetter=TRUE, config=data.frame(bUsePval=TRUE), bCorPlot=FALSE)

count.data <- dba.count( dba.object, minOverlap=1, score=DBA_SCORE_READS, bCorPlot=FALSE, summits=100,
bUseSummarizeOverlaps=F )

 

 

To retrieve the count matrix, along with the peak coordinates, I used:

count.matrix <- dba.peakset(count.data, bRetrieve=T, DataType=DBA_DATA_FRAME)

 

However, I realized that there were a very small set of peaks larger than 201bp in the final consensus peak set (approximately 5% of the consensus peak set). I am unable to figure out why this would be so and would appreciate any assistance on this. 

My saved dba.object can be retrieved from here: https://www.dropbox.com/s/eh3qpanm0lgdn6f/dba_object_debug.rds?dl=0

The DiffBind version I used was 2.2.9

Thank you,

Mei San

 

sessionInfo()

R version 3.3.2 (2016-10-31)

Platform: x86_64-centos-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

[9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base     


other attached packages:

[1] DiffBind_2.2.9             SummarizedExperiment_1.4.0

[3] Biobase_2.34.0             GenomicRanges_1.26.2      

[5] GenomeInfoDb_1.10.2        IRanges_2.8.0             

[7] S4Vectors_0.14.4           BiocGenerics_0.22.0       


loaded via a namespace (and not attached):

[1] Rcpp_0.12.11             locfit_1.5-9.1           lattice_0.20-34         

[4] GO.db_3.4.0              Rsamtools_1.26.1         Biostrings_2.42.1       

[7] gtools_3.5.0             assertthat_0.2.0         digest_0.6.12           

[10] R6_2.2.2                 plyr_1.8.4               BatchJobs_1.6           

[13] backports_1.0.5          ShortRead_1.32.1         RSQLite_1.0.0           

[16] ggplot2_2.2.1            gplots_3.0.1             zlibbioc_1.20.0         

[19] rlang_0.1.1              GenomicFeatures_1.26.2   lazyeval_0.2.0          

[22] annotate_1.52.1          gdata_2.18.0             Matrix_1.2-7.1          

[25] checkmate_1.8.2          systemPipeR_1.8.1        GOstats_2.40.0          

[28] splines_3.3.2            BiocParallel_1.8.1       stringr_1.2.0           

[31] pheatmap_1.0.8           RCurl_1.95-4.8           biomaRt_2.30.0          

[34] munsell_0.4.3            sendmailR_1.2-1          rtracklayer_1.34.1      

[37] pkgconfig_2.0.1          base64enc_0.1-3          BBmisc_1.11             

[40] fail_1.3                 tibble_1.3.3             edgeR_3.16.5            

[43] XML_3.98-1.5             AnnotationForge_1.16.0   dplyr_0.7.1             

[46] GenomicAlignments_1.10.0 bitops_1.0-6             grid_3.3.2              

[49] RBGL_1.50.0              xtable_1.8-2             GSEABase_1.36.0         

[52] gtable_0.2.0             DBI_0.5-1                magrittr_1.5            

[55] scales_0.4.1             graph_1.52.0             KernSmooth_2.23-15      

[58] amap_0.8-14              stringi_1.1.5            XVector_0.14.0          

[61] hwriter_1.3.2            genefilter_1.56.0        bindrcpp_0.2            

[64] limma_3.30.8             latticeExtra_0.6-28      brew_1.0-6              

[67] rjson_0.2.15             RColorBrewer_1.1-2       tools_3.3.2             

[70] glue_1.1.1               Category_2.40.0          survival_2.39-5         

[73] AnnotationDbi_1.36.0     colorspace_1.3-2         caTools_1.17.1          

[76] bindr_0.1               
diffbind • 1.5k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 10 weeks ago
Cambridge, UK

If two or more peaks overlap by at least one base after being re-centered around the summit, they will be merged into a single wider peak, consistent with how DiffBind generally handles overlapping peaks. That is why a subset of your peaks are wider than expected.

This issue has come up before, and I have always viewed it as a "feature". However upon further reflection I'm thinking it is actually a "bug", and intend to fix it in the next release. The new behavior will be to to check for merged peaks after re-centering (wider than expected), and re-center again over the highest summit, so the peak intervals are all uniform size.

-Rory

ADD COMMENT
0
Entering edit mode

Hi Rory,

Thank you for the clarification. I think the fix you are planning would fit in better with the goal of the "summit" function. I look forward to the new release.

Thanks,

Mei San

ADD REPLY
0
Entering edit mode

Sometime, when we have some spare time (haha!) we should consider using PeakSplitter or similar to split distinct but nearby peaks, rather than discarding all but one.  It (probably) wouldn't be that hard to write a wrapper R package around PeakSplitter's C++ core... I'll see what Mali (the author) thinks of the idea.

 Cheers, - Gord

ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6