Question

DiffBind - overlap between different peak callers

0

Entering edit mode

Paolo Kunderfranco ▴ 350

@paolo-kunderfranco-5158

Last seen 7.8 years ago

Dear All, I am using package DiffBind, Due to the fact that I miss real biological or technical replicates, I am assesing the overlapping rates between different peak callers. Here is how I set up my database to read in: 3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions > rm(list=ls()) > setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") > test=dba(sampleSheet="database.csv") > test 24 Samples, 19380 sites in matrix (40809 total): ID Tissue Factor Condition Replicate Intervals 1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834 2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450 3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990 4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188 5 CMp_H3K27me3_s CMN H3K27 CMp_H3K27me3 2 3388 6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946 7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243 8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476 9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632 10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917 11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646 12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985 13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110 14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484 15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938 16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569 17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808 18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747 19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318 20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210 21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988 22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164 23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429 24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063 > test=dba.count(test) > test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) #minMemebers was se to 2 due to the fact I have 2 replicates for each condition #....78 Contrasts: > test = dba.analyze(test) #for each condition generate a report >test.DB1=dba.report(test,contrast=1) I would like now to assess overlapping rates between replicates, and generate a venn diagram. but it seems like replicates are merged together and I cannot treat them as single samples. I noticed this behaviour also when i try to generate both a PCA plot and a heatmap plot, replicates are merged, so what's the utility to display both in the PCA or heatmap plot? would it be simple to display the original replicates to assess how do they cluster? >pdf('PCA_plot_DBA_CONDITION.pdf') >dba.plotPCA(test,attributes=DBA_CONDITION, vColors=c("grey0","grey23" ,"grey47","antiquewhite4","antiquewhite3","antiquewhite1","dodgerblue4 ","dodgerblue","darkslategray1","brown4","brown","coral")) >dev.off() >pdf('HeatMap.pdf') >corvals=dba.plotHeatmap(test) >dev.off() >pdf('HeatMap_RPKM_FOLD.pdf') >corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD) >dev.off() > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9 [4] IRanges_1.14.4 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 gplots_2.11.0 [5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 stats4_2.15.1 [9] tools_2.15.1 zlibbioc_1.2.0 Any sugestion is appreciated, cheers, paolo

CMA DiffBind CMA DiffBind • 1.8k views

ADD COMMENT • link 12.7 years ago Paolo Kunderfranco ▴ 350

score 0 · Answer 1 · 2012-08-09

0

Entering edit mode

Paolo Kunderfranco ▴ 350

@paolo-kunderfranco-5158

Last seen 7.8 years ago

Dear Rory, Thanks again for Your input, I went through and finally obtained consensus peakset identified by both peak callers for all the samples: rm(list=ls()) library("DiffBind") setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") test=dba(sampleSheet="database.csv") #clustering based on occupancy/peak calling pdf('clustering based on occupancy_peak calling.pdf') dba.plotHeatmap(test) dba.plotPCA(test) dev.off() par(mfrow=c(1,3)) pdf('VENN_H3K27me3.pdf') dba.plotVenn(test, test$masks$ES & test$masks$H3K27,label1='mES_m',label2='mES_s') dba.plotVenn(test, test$masks$CMN & test$masks$H3K27,label1='CMp_m',label2='CMp_s') dba.plotVenn(test, test$masks$CMA & test$masks$H3K27,label1='CMa_m',label2='CMa_s') dev.off() pdf('VENN_H3K4me3.pdf') dba.plotVenn(test, test$masks$ES & test$masks$H3K4,label1='mES_m',label2='mES_s') dba.plotVenn(test, test$masks$CMN & test$masks$H3K4,label1='CMp_m',label2='CMp_s') dba.plotVenn(test, test$masks$CMA & test$masks$H3K4,label1='CMa_m',label2='CMa_s') dev.off() pdf('VENN_H3K9me3.pdf') dba.plotVenn(test, test$masks$ES & test$masks$H3K9,label1='mES_m',label2='mES_s') dba.plotVenn(test, test$masks$CMN & test$masks$H3K9,label1='CMp_m',label2='CMp_s') dba.plotVenn(test, test$masks$CMA & test$masks$H3K9,label1='CMa_m',label2='CMa_s') dev.off() pdf('VENN_H3K79me2.pdf') dba.plotVenn(test, test$masks$ES & test$masks$H3K79,label1='mES_m',label2='mES_s') dba.plotVenn(test, test$masks$CMN & test$masks$H3K79,label1='CMp_m',label2='CMp_s') dba.plotVenn(test, test$masks$CMA & test$masks$H3K79,label1='CMa_m',label2='CMa_s') dev.off() test = dba.peakset(test, test$masks$ES & test$masks$H3K27, minOverlap=2, sampID="mES_H3K27me3") test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, minOverlap=2, sampID="CMp_H3K27me3") test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, minOverlap=2, sampID="CMa_H3K27me3") pdf('Overlap_H3K27me3.pdf') dba.plotVenn(test, test$masks$H3K27 & test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') dev.off() test = dba.peakset(test, test$masks$ES & test$masks$H3K9, minOverlap=2, sampID="mES_H3K9me3") test = dba.peakset(test, test$masks$CMN & test$masks$H3K9, minOverlap=2, sampID="CMp_H3K9me3") test = dba.peakset(test, test$masks$CMA & test$masks$H3K9, minOverlap=2, sampID="CMa_H3K9me3") pdf('Overlap_H3K9me3.pdf') dba.plotVenn(test, test$masks$H3K9 & test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') dev.off() test = dba.peakset(test, test$masks$ES & test$masks$H3K79, minOverlap=2, sampID="mES_H3K79me2") test = dba.peakset(test, test$masks$CMN & test$masks$H3K79, minOverlap=2, sampID="CMp_H3K79me2") test = dba.peakset(test, test$masks$CMA & test$masks$H3K79, minOverlap=2, sampID="CMa_H3K79me2") pdf('Overlap_H3K79me2.pdf') dba.plotVenn(test, test$masks$H3K79 & test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') dev.off() test = dba.peakset(test, test$masks$ES & test$masks$H3K4, minOverlap=2, sampID="mES_H3K4me3") test = dba.peakset(test, test$masks$CMN & test$masks$H3K4, minOverlap=2, sampID="CMp_H3K4me3") test = dba.peakset(test, test$masks$CMA & test$masks$H3K4, minOverlap=2, sampID="CMa_H3K4me3") pdf('Overlap_H3K4me3.pdf') dba.plotVenn(test, test$masks$H3K4 & test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') dev.off() consdata = dba(test, mask = test$masks$Consensus) > consdata 12 Samples, 9754 sites in matrix (16143 total): ID Tissue Factor Condition Replicate Intervals 1 mES_H3K27me3 ES H3K27 mES_H3K27me3 1-2 1333 2 CMp_H3K27me3 CMN H3K27 CMp_H3K27me3 1-2 911 3 CMa_H3K27me3 CMA H3K27 CMa_H3K27me3 1-2 316 4 mES_H3K9me3 ES H3K9 mES_H3K9me3 1-2 89 5 CMp_H3K9me3 CMN H3K9 CMp_H3K9me3 1-2 86 6 CMa_H3K9me3 CMA H3K9 CMa_H3K9me3 1-2 323 7 mES_H3K79me2 ES H3K79 mES_H3K79me2 1-2 8590 8 CMp_H3K79me2 CMN H3K79 CMp_H3K79me2 1-2 4641 9 CMa_H3K79me2 CMA H3K79 CMa_H3K79me2 1-2 1084 10 mES_H3K4me3 ES H3K4 mES_H3K4me3 1-2 10018 11 CMp_H3K4me3 CMN H3K4 CMp_H3K4me3 1-2 8748 12 CMa_H3K4me3 CMA H3K4 CMa_H3K4me3 1-2 2998 consdata = dba.count(consdata, minOverlap=1) After I call dba.count I am prompted out R: terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::compare Abort Where am I wrong? Cheers, Paolo 2012/8/8 Rory Stark <rory.stark at="" cancer.org.uk="">: > Hello Paolo- > > To examine the overlaps between peak callers, you need to be working with > the peak (occupancy) data BEFORE you settle on a global peakset used for > counting and subsequent analysis. In your example below, you have this in > "test" after calling "dba" but before calling "dba.count". > > So after > >> test=dba(sampleSheet="database.csv") > > you can check clustering based on occupancy/peak calling: > >> dba,plotHeatmap(test) >> dba.plotPCA(test) > > > You can also plot Venn diagrams of each of the peak caller overlaps -- eg > for the H3K27me3 mark, you could see the three sample overlaps as follows: > >> par(mfrow=c(1,3)) >> dba.plotVenn(test, test$masks$ES & test$masks$H3K27) >> dba.plotVenn(test, test$masks$CMN & test$masks$H3K27) > >> dba.plotVenn(test, test$masks$CMA & test$masks$H3K27) > > > Note that specifying minOverlap=2 to dba.count doesn't limit the overlap > to replicate peak callers. Rather, it creates a single set of peaks that > are identified at least twice amongst all the peaksets: either by both > peak callers for a single sample, or by the same peak caller in different > samples, by different peak callers in different samples. If you want to > use only peaks called by both peak callers for each sample,you can add a > separate consensus peakset for eac condition, eg: > >> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >>minOvrlap=2, ID="mES_H3K27me3") >> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >>minOverlap=2, ID="CMp_H3K27e3") > >> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >>minOverlp=2, ID="CMa_H3K27me3") > > > repeated 9 more times for a total of 12 pairs. You could if you like see > the peak overlaps of the three H3K27me3 samples at this point: > >> dba.plotVenn(test, testmasks$H3K27 & test$masks$Consensus) > > Once you have added all 12 consensus peaksets, create a new DBA object > containing only the consensus peaksets: > >> consdata = ba(test, mask = test$masks$Consensus) > > Then, if you want to use all the peaks identified by both peak callers for > all the samples: > >> consdata = dba.count(consdata, minOverlap=1) > > And continue with the differential analysis as you had done. > > Cheers- > Rory > >> >>Dear All, >>I am using package DiffBind, >>Due to the fact that I miss real biological or technical replicates, I >>am assesing the overlapping rates between different peak callers. >>Here is how I set up my database to read in: >> >>3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions >> >>> rm(list=ls()) >>> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >>> test=dba(sampleSheet="database.csv") >>> test >>24 Samples, 19380 sites in matrix (40809 total): >> ID Tissue Factor Condition Replicate Intervals >>1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834 >>2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450 >>3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990 >>4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188 >>5 CMp_H3K27me3_s CN H3K27 CMp_H3K27me3 2 3388 >>6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946 >>7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243 >>8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476 >>9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632 >>10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917 >>11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646 >>12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985 >>13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110 >>14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484 >>15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938 >>16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569 >>17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808 >>18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747 >>19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318 >>20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210 >>21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988 >>22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164 >>23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429 >>24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063 >> >> >>> test=dba.count(test) >> >>> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) >>#minMemebers was se to 2 due to >>the fact I have 2 replicates for each condition >>#....78 Contrasts: >> >>> test = dba.analyze(test) >> >>#for each condition generate a report >>>test.DB1=dba.report(test,contrast=1) >> >> >>I would like now to assess overlapping rates between replicates, and >>generate a venn diagram. >>but it seems like replicates are merged together and I cannot treat >>them as single samples. >>I noticed this behaviour also when i try to generate both a PCA plot >>and a heatmap plot, replicates are merged, so >>what's the utility to display both in the PCA or heatmap plot? >>would it be simple to display the original replicates to assess how do >>they cluster? >> >>>pdf('PCA_plot_DBA_CONDITION.pdf') >>>dba.plotPCA(test,attributes=DBA_CONDITION, >>>vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhite3" ,"anti >>>quewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4","br own"," >>>coral")) >>>dev.off() >> >>>pdf('HeatMap.pdf') >>>corvals=dba.plotHeatmap(test) >>>dev.off() >> >>>pdf('HeatMap_RPKM_FOLD.pdf') >>>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD) >>>dev.off() >> >>> sessionInfo() >>R version 2.15.1 (2012-06-22) >>Platform: x86_64-unknown-linux-gnu (64-bit) >> >>locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>[11] C_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >>attached base packages: >>[1] parallel stats graphics grDevices utils datasets methods >>[8] base >> >>other attached packages: >>[1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9 >>[4] IRanges_1.14.4 BiocGenerics_0.2.0 >> >>loaded via a namespace (and not attached): >> [1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 >>gplots_2.11.0 >> [5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 >>stats4_2.15.1 >> [9] tools_2.15.1 zlibbioc_1.2.0 >> >> >> >>Any sugestion is appreciated, >>cheers, >>paolo >> >>

ADD COMMENT • link 12.7 years ago Paolo Kunderfranco ▴ 350

0

Entering edit mode

Hi, Any idea on how to solve this problem? I am stuck just at the end of the project, Cheers, Paolo > After I call dba.count I am prompted out R: > > terminate called after throwing an instance of 'std::out_of_range' > what(): basic_string::compare > Abort > 2012/8/9 Paolo Kunderfranco <paolo.kunderfranco at="" gmail.com="">: > Dear Rory, > Thanks again for Your input, I went through and finally obtained > consensus peakset identified by both peak callers for > all the samples: > > > > rm(list=ls()) > library("DiffBind") > setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") > > test=dba(sampleSheet="database.csv") > #clustering based on occupancy/peak calling > pdf('clustering based on occupancy_peak calling.pdf') > dba.plotHeatmap(test) > dba.plotPCA(test) > dev.off() > > par(mfrow=c(1,3)) > > pdf('VENN_H3K27me3.pdf') > dba.plotVenn(test, test$masks$ES & > test$masks$H3K27,label1='mES_m',label2='mES_s') > dba.plotVenn(test, test$masks$CMN & > test$masks$H3K27,label1='CMp_m',label2='CMp_s') > dba.plotVenn(test, test$masks$CMA & > test$masks$H3K27,label1='CMa_m',label2='CMa_s') > dev.off() > pdf('VENN_H3K4me3.pdf') > dba.plotVenn(test, test$masks$ES & > test$masks$H3K4,label1='mES_m',label2='mES_s') > dba.plotVenn(test, test$masks$CMN & > test$masks$H3K4,label1='CMp_m',label2='CMp_s') > dba.plotVenn(test, test$masks$CMA & > test$masks$H3K4,label1='CMa_m',label2='CMa_s') > dev.off() > pdf('VENN_H3K9me3.pdf') > dba.plotVenn(test, test$masks$ES & > test$masks$H3K9,label1='mES_m',label2='mES_s') > dba.plotVenn(test, test$masks$CMN & > test$masks$H3K9,label1='CMp_m',label2='CMp_s') > dba.plotVenn(test, test$masks$CMA & > test$masks$H3K9,label1='CMa_m',label2='CMa_s') > dev.off() > pdf('VENN_H3K79me2.pdf') > dba.plotVenn(test, test$masks$ES & > test$masks$H3K79,label1='mES_m',label2='mES_s') > dba.plotVenn(test, test$masks$CMN & > test$masks$H3K79,label1='CMp_m',label2='CMp_s') > dba.plotVenn(test, test$masks$CMA & > test$masks$H3K79,label1='CMa_m',label2='CMa_s') > dev.off() > test = dba.peakset(test, test$masks$ES & test$masks$H3K27, > minOverlap=2, sampID="mES_H3K27me3") > test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, > minOverlap=2, sampID="CMp_H3K27me3") > test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, > minOverlap=2, sampID="CMa_H3K27me3") > pdf('Overlap_H3K27me3.pdf') > dba.plotVenn(test, test$masks$H3K27 & > test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') > dev.off() > test = dba.peakset(test, test$masks$ES & test$masks$H3K9, > minOverlap=2, sampID="mES_H3K9me3") > test = dba.peakset(test, test$masks$CMN & test$masks$H3K9, > minOverlap=2, sampID="CMp_H3K9me3") > test = dba.peakset(test, test$masks$CMA & test$masks$H3K9, > minOverlap=2, sampID="CMa_H3K9me3") > pdf('Overlap_H3K9me3.pdf') > dba.plotVenn(test, test$masks$H3K9 & > test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') > dev.off() > test = dba.peakset(test, test$masks$ES & test$masks$H3K79, > minOverlap=2, sampID="mES_H3K79me2") > test = dba.peakset(test, test$masks$CMN & test$masks$H3K79, > minOverlap=2, sampID="CMp_H3K79me2") > test = dba.peakset(test, test$masks$CMA & test$masks$H3K79, > minOverlap=2, sampID="CMa_H3K79me2") > pdf('Overlap_H3K79me2.pdf') > dba.plotVenn(test, test$masks$H3K79 & > test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') > dev.off() > test = dba.peakset(test, test$masks$ES & test$masks$H3K4, > minOverlap=2, sampID="mES_H3K4me3") > test = dba.peakset(test, test$masks$CMN & test$masks$H3K4, > minOverlap=2, sampID="CMp_H3K4me3") > test = dba.peakset(test, test$masks$CMA & test$masks$H3K4, > minOverlap=2, sampID="CMa_H3K4me3") > pdf('Overlap_H3K4me3.pdf') > dba.plotVenn(test, test$masks$H3K4 & > test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') > dev.off() > > consdata = dba(test, mask = test$masks$Consensus) >> consdata > 12 Samples, 9754 sites in matrix (16143 total): > ID Tissue Factor Condition Replicate Intervals > 1 mES_H3K27me3 ES H3K27 mES_H3K27me3 1-2 1333 > 2 CMp_H3K27me3 CMN H3K27 CMp_H3K27me3 1-2 911 > 3 CMa_H3K27me3 CMA H3K27 CMa_H3K27me3 1-2 316 > 4 mES_H3K9me3 ES H3K9 mES_H3K9me3 1-2 89 > 5 CMp_H3K9me3 CMN H3K9 CMp_H3K9me3 1-2 86 > 6 CMa_H3K9me3 CMA H3K9 CMa_H3K9me3 1-2 323 > 7 mES_H3K79me2 ES H3K79 mES_H3K79me2 1-2 8590 > 8 CMp_H3K79me2 CMN H3K79 CMp_H3K79me2 1-2 4641 > 9 CMa_H3K79me2 CMA H3K79 CMa_H3K79me2 1-2 1084 > 10 mES_H3K4me3 ES H3K4 mES_H3K4me3 1-2 10018 > 11 CMp_H3K4me3 CMN H3K4 CMp_H3K4me3 1-2 8748 > 12 CMa_H3K4me3 CMA H3K4 CMa_H3K4me3 1-2 2998 > > consdata = dba.count(consdata, minOverlap=1) > > After I call dba.count I am prompted out R: > > terminate called after throwing an instance of 'std::out_of_range' > what(): basic_string::compare > Abort > > Where am I wrong? > Cheers, > Paolo > > > > > > > 2012/8/8 Rory Stark <rory.stark at="" cancer.org.uk="">: >> Hello Paolo- >> >> To examine the overlaps between peak callers, you need to be working with >> the peak (occupancy) data BEFORE you settle on a global peakset used for >> counting and subsequent analysis. In your example below, you have this in >> "test" after calling "dba" but before calling "dba.count". >> >> So after >> >>> test=dba(sampleSheet="database.csv") >> >> you can check clustering based on occupancy/peak calling: >> >>> dba,plotHeatmap(test) >>> dba.plotPCA(test) >> >> >> You can also plot Venn diagrams of each of the peak caller overlaps -- eg >> for the H3K27me3 mark, you could see the three sample overlaps as follows: >> >>> par(mfrow=c(1,3)) >>> dba.plotVenn(test, test$masks$ES & test$masks$H3K27) >>> dba.plotVenn(test, test$masks$CMN & test$masks$H3K27) >> >>> dba.plotVenn(test, test$masks$CMA & test$masks$H3K27) >> >> >> Note that specifying minOverlap=2 to dba.count doesn't limit the overlap >> to replicate peak callers. Rather, it creates a single set of peaks that >> are identified at least twice amongst all the peaksets: either by both >> peak callers for a single sample, or by the same peak caller in different >> samples, by different peak callers in different samples. If you want to >> use only peaks called by both peak callers for each sample,you can add a >> separate consensus peakset for eac condition, eg: >> >>> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >>>minOvrlap=2, ID="mES_H3K27me3") >>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >>>minOverlap=2, ID="CMp_H3K27e3") >> >>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >>>minOverlp=2, ID="CMa_H3K27me3") >> >> >> repeated 9 more times for a total of 12 pairs. You could if you like see >> the peak overlaps of the three H3K27me3 samples at this point: >> >>> dba.plotVenn(test, testmasks$H3K27 & test$masks$Consensus) >> >> Once you have added all 12 consensus peaksets, create a new DBA object >> containing only the consensus peaksets: >> >>> consdata = ba(test, mask = test$masks$Consensus) >> >> Then, if you want to use all the peaks identified by both peak callers for >> all the samples: >> >>> consdata = dba.count(consdata, minOverlap=1) >> >> And continue with the differential analysis as you had done. >> >> Cheers- >> Rory >> >>> >>>Dear All, >>>I am using package DiffBind, >>>Due to the fact that I miss real biological or technical replicates, I >>>am assesing the overlapping rates between different peak callers. >>>Here is how I set up my database to read in: >>> >>>3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions >>> >>>> rm(list=ls()) >>>> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >>>> test=dba(sampleSheet="database.csv") >>>> test >>>24 Samples, 19380 sites in matrix (40809 total): >>> ID Tissue Factor Condition Replicate Intervals >>>1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834 >>>2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450 >>>3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990 >>>4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188 >>>5 CMp_H3K27me3_s CN H3K27 CMp_H3K27me3 2 3388 >>>6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946 >>>7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243 >>>8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476 >>>9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632 >>>10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917 >>>11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646 >>>12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985 >>>13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110 >>>14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484 >>>15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938 >>>16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569 >>>17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808 >>>18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747 >>>19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318 >>>20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210 >>>21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988 >>>22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164 >>>23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429 >>>24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063 >>> >>> >>>> test=dba.count(test) >>> >>>> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) >>>#minMemebers was se to 2 due to >>>the fact I have 2 replicates for each condition >>>#....78 Contrasts: >>> >>>> test = dba.analyze(test) >>> >>>#for each condition generate a report >>>>test.DB1=dba.report(test,contrast=1) >>> >>> >>>I would like now to assess overlapping rates between replicates, and >>>generate a venn diagram. >>>but it seems like replicates are merged together and I cannot treat >>>them as single samples. >>>I noticed this behaviour also when i try to generate both a PCA plot >>>and a heatmap plot, replicates are merged, so >>>what's the utility to display both in the PCA or heatmap plot? >>>would it be simple to display the original replicates to assess how do >>>they cluster? >>> >>>>pdf('PCA_plot_DBA_CONDITION.pdf') >>>>dba.plotPCA(test,attributes=DBA_CONDITION, >>>>vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhite3 ","anti >>>>quewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4","b rown"," >>>>coral")) >>>>dev.off() >>> >>>>pdf('HeatMap.pdf') >>>>corvals=dba.plotHeatmap(test) >>>>dev.off() >>> >>>>pdf('HeatMap_RPKM_FOLD.pdf') >>>>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD) >>>>dev.off() >>> >>>> sessionInfo() >>>R version 2.15.1 (2012-06-22) >>>Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>>locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>[11] C_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>>attached base packages: >>>[1] parallel stats graphics grDevices utils datasets methods >>>[8] base >>> >>>other attached packages: >>>[1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9 >>>[4] IRanges_1.14.4 BiocGenerics_0.2.0 >>> >>>loaded via a namespace (and not attached): >>> [1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 >>>gplots_2.11.0 >>> [5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 >>>stats4_2.15.1 >>> [9] tools_2.15.1 zlibbioc_1.2.0 >>> >>> >>> >>>Any sugestion is appreciated, >>>cheers, >>>paolo >>> >>>

ADD REPLY • link 12.7 years ago Paolo Kunderfranco ▴ 350

0

Entering edit mode

Hi, Paolo, If it's crashing in dba.count, it either can't find the bed files of the reads, or doesn't like their format. Most likely the latter. One unfortunate restriction the release version has is that it expects bed files to have either exactly 3 or 6 (or more) columns. If your bed files have 4 or 5 columns, it'll crash trying to retrieve the (absent) strand column. This bug is fixed in my latest development code (not yet available), but not in the release version. Check to see if your bed files have 4 or 5 columns. Otherwise, maybe there's some corruption in the data files. Does it crash quickly, or does it take a while and then crash? If it's the 4/5 column problem, let me know and I'll send you a patch. Otherwise, I can whip up a little script that verifies the format of the bed files, and send it to you. Cheers, - Gord On 2012-08-14 15:51, "Paolo Kunderfranco" <paolo.kunderfranco at="" gmail.com=""> wrote: >Hi, >Any idea on how to solve this problem? I am stuck just at the end of >the project, >Cheers, >Paolo > > >> After I call dba.count I am prompted out R: >> >> terminate called after throwing an instance of 'std::out_of_range' >> what(): basic_string::compare >> Abort >> > > >2012/8/9 Paolo Kunderfranco <paolo.kunderfranco at="" gmail.com="">: >> Dear Rory, >> Thanks again for Your input, I went through and finally obtained >> consensus peakset identified by both peak callers for >> all the samples: >> >> >> >> rm(list=ls()) >> library("DiffBind") >> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >> >> test=dba(sampleSheet="database.csv") >> #clustering based on occupancy/peak calling >> pdf('clustering based on occupancy_peak calling.pdf') >> dba.plotHeatmap(test) >> dba.plotPCA(test) >> dev.off() >> >> par(mfrow=c(1,3)) >> >> pdf('VENN_H3K27me3.pdf') >> dba.plotVenn(test, test$masks$ES & >> test$masks$H3K27,label1='mES_m',label2='mES_s') >> dba.plotVenn(test, test$masks$CMN & >> test$masks$H3K27,label1='CMp_m',label2='CMp_s') >> dba.plotVenn(test, test$masks$CMA & >> test$masks$H3K27,label1='CMa_m',label2='CMa_s') >> dev.off() >> pdf('VENN_H3K4me3.pdf') >> dba.plotVenn(test, test$masks$ES & >> test$masks$H3K4,label1='mES_m',label2='mES_s') >> dba.plotVenn(test, test$masks$CMN & >> test$masks$H3K4,label1='CMp_m',label2='CMp_s') >> dba.plotVenn(test, test$masks$CMA & >> test$masks$H3K4,label1='CMa_m',label2='CMa_s') >> dev.off() >> pdf('VENN_H3K9me3.pdf') >> dba.plotVenn(test, test$masks$ES & >> test$masks$H3K9,label1='mES_m',label2='mES_s') >> dba.plotVenn(test, test$masks$CMN & >> test$masks$H3K9,label1='CMp_m',label2='CMp_s') >> dba.plotVenn(test, test$masks$CMA & >> test$masks$H3K9,label1='CMa_m',label2='CMa_s') >> dev.off() >> pdf('VENN_H3K79me2.pdf') >> dba.plotVenn(test, test$masks$ES & >> test$masks$H3K79,label1='mES_m',label2='mES_s') >> dba.plotVenn(test, test$masks$CMN & >> test$masks$H3K79,label1='CMp_m',label2='CMp_s') >> dba.plotVenn(test, test$masks$CMA & >> test$masks$H3K79,label1='CMa_m',label2='CMa_s') >> dev.off() >> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >> minOverlap=2, sampID="mES_H3K27me3") >> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >> minOverlap=2, sampID="CMp_H3K27me3") >> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >> minOverlap=2, sampID="CMa_H3K27me3") >> pdf('Overlap_H3K27me3.pdf') >> dba.plotVenn(test, test$masks$H3K27 & >> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >> dev.off() >> test = dba.peakset(test, test$masks$ES & test$masks$H3K9, >> minOverlap=2, sampID="mES_H3K9me3") >> test = dba.peakset(test, test$masks$CMN & test$masks$H3K9, >> minOverlap=2, sampID="CMp_H3K9me3") >> test = dba.peakset(test, test$masks$CMA & test$masks$H3K9, >> minOverlap=2, sampID="CMa_H3K9me3") >> pdf('Overlap_H3K9me3.pdf') >> dba.plotVenn(test, test$masks$H3K9 & >> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >> dev.off() >> test = dba.peakset(test, test$masks$ES & test$masks$H3K79, >> minOverlap=2, sampID="mES_H3K79me2") >> test = dba.peakset(test, test$masks$CMN & test$masks$H3K79, >> minOverlap=2, sampID="CMp_H3K79me2") >> test = dba.peakset(test, test$masks$CMA & test$masks$H3K79, >> minOverlap=2, sampID="CMa_H3K79me2") >> pdf('Overlap_H3K79me2.pdf') >> dba.plotVenn(test, test$masks$H3K79 & >> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >> dev.off() >> test = dba.peakset(test, test$masks$ES & test$masks$H3K4, >> minOverlap=2, sampID="mES_H3K4me3") >> test = dba.peakset(test, test$masks$CMN & test$masks$H3K4, >> minOverlap=2, sampID="CMp_H3K4me3") >> test = dba.peakset(test, test$masks$CMA & test$masks$H3K4, >> minOverlap=2, sampID="CMa_H3K4me3") >> pdf('Overlap_H3K4me3.pdf') >> dba.plotVenn(test, test$masks$H3K4 & >> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >> dev.off() >> >> consdata = dba(test, mask = test$masks$Consensus) >>> consdata >> 12 Samples, 9754 sites in matrix (16143 total): >> ID Tissue Factor Condition Replicate Intervals >> 1 mES_H3K27me3 ES H3K27 mES_H3K27me3 1-2 1333 >> 2 CMp_H3K27me3 CMN H3K27 CMp_H3K27me3 1-2 911 >> 3 CMa_H3K27me3 CMA H3K27 CMa_H3K27me3 1-2 316 >> 4 mES_H3K9me3 ES H3K9 mES_H3K9me3 1-2 89 >> 5 CMp_H3K9me3 CMN H3K9 CMp_H3K9me3 1-2 86 >> 6 CMa_H3K9me3 CMA H3K9 CMa_H3K9me3 1-2 323 >> 7 mES_H3K79me2 ES H3K79 mES_H3K79me2 1-2 8590 >> 8 CMp_H3K79me2 CMN H3K79 CMp_H3K79me2 1-2 4641 >> 9 CMa_H3K79me2 CMA H3K79 CMa_H3K79me2 1-2 1084 >> 10 mES_H3K4me3 ES H3K4 mES_H3K4me3 1-2 10018 >> 11 CMp_H3K4me3 CMN H3K4 CMp_H3K4me3 1-2 8748 >> 12 CMa_H3K4me3 CMA H3K4 CMa_H3K4me3 1-2 2998 >> >> consdata = dba.count(consdata, minOverlap=1) >> >> After I call dba.count I am prompted out R: >> >> terminate called after throwing an instance of 'std::out_of_range' >> what(): basic_string::compare >> Abort >> >> Where am I wrong? >> Cheers, >> Paolo >> >> >> >> >> >> >> 2012/8/8 Rory Stark <rory.stark at="" cancer.org.uk="">: >>> Hello Paolo- >>> >>> To examine the overlaps between peak callers, you need to be working >>>with >>> the peak (occupancy) data BEFORE you settle on a global peakset used >>>for >>> counting and subsequent analysis. In your example below, you have this >>>in >>> "test" after calling "dba" but before calling "dba.count". >>> >>> So after >>> >>>> test=dba(sampleSheet="database.csv") >>> >>> you can check clustering based on occupancy/peak calling: >>> >>>> dba,plotHeatmap(test) >>>> dba.plotPCA(test) >>> >>> >>> You can also plot Venn diagrams of each of the peak caller overlaps -- >>>eg >>> for the H3K27me3 mark, you could see the three sample overlaps as >>>follows: >>> >>>> par(mfrow=c(1,3)) >>>> dba.plotVenn(test, test$masks$ES & test$masks$H3K27) >>>> dba.plotVenn(test, test$masks$CMN & test$masks$H3K27) >>> >>>> dba.plotVenn(test, test$masks$CMA & test$masks$H3K27) >>> >>> >>> Note that specifying minOverlap=2 to dba.count doesn't limit the >>>overlap >>> to replicate peak callers. Rather, it creates a single set of peaks >>>that >>> are identified at least twice amongst all the peaksets: either by both >>> peak callers for a single sample, or by the same peak caller in >>>different >>> samples, by different peak callers in different samples. If you want >>>to >>> use only peaks called by both peak callers for each sample,you can add >>>a >>> separate consensus peakset for eac condition, eg: >>> >>>> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >>>>minOvrlap=2, ID="mES_H3K27me3") >>>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >>>>minOverlap=2, ID="CMp_H3K27e3") >>> >>>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >>>>minOverlp=2, ID="CMa_H3K27me3") >>> >>> >>> repeated 9 more times for a total of 12 pairs. You could if you like >>>see >>> the peak overlaps of the three H3K27me3 samples at this point: >>> >>>> dba.plotVenn(test, testmasks$H3K27 & test$masks$Consensus) >>> >>> Once you have added all 12 consensus peaksets, create a new DBA object >>> containing only the consensus peaksets: >>> >>>> consdata = ba(test, mask = test$masks$Consensus) >>> >>> Then, if you want to use all the peaks identified by both peak callers >>>for >>> all the samples: >>> >>>> consdata = dba.count(consdata, minOverlap=1) >>> >>> And continue with the differential analysis as you had done. >>> >>> Cheers- >>> Rory >>> >>>> >>>>Dear All, >>>>I am using package DiffBind, >>>>Due to the fact that I miss real biological or technical replicates, I >>>>am assesing the overlapping rates between different peak callers. >>>>Here is how I set up my database to read in: >>>> >>>>3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions >>>> >>>>> rm(list=ls()) >>>>> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >>>>> test=dba(sampleSheet="database.csv") >>>>> test >>>>24 Samples, 19380 sites in matrix (40809 total): >>>> ID Tissue Factor Condition Replicate Intervals >>>>1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834 >>>>2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450 >>>>3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990 >>>>4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188 >>>>5 CMp_H3K27me3_s CN H3K27 CMp_H3K27me3 2 3388 >>>>6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946 >>>>7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243 >>>>8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476 >>>>9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632 >>>>10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917 >>>>11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646 >>>>12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985 >>>>13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110 >>>>14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484 >>>>15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938 >>>>16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569 >>>>17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808 >>>>18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747 >>>>19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318 >>>>20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210 >>>>21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988 >>>>22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164 >>>>23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429 >>>>24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063 >>>> >>>> >>>>> test=dba.count(test) >>>> >>>>> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) >>>>#minMemebers was se to 2 due to >>>>the fact I have 2 replicates for each condition >>>>#....78 Contrasts: >>>> >>>>> test = dba.analyze(test) >>>> >>>>#for each condition generate a report >>>>>test.DB1=dba.report(test,contrast=1) >>>> >>>> >>>>I would like now to assess overlapping rates between replicates, and >>>>generate a venn diagram. >>>>but it seems like replicates are merged together and I cannot treat >>>>them as single samples. >>>>I noticed this behaviour also when i try to generate both a PCA plot >>>>and a heatmap plot, replicates are merged, so >>>>what's the utility to display both in the PCA or heatmap plot? >>>>would it be simple to display the original replicates to assess how do >>>>they cluster? >>>> >>>>>pdf('PCA_plot_DBA_CONDITION.pdf') >>>>>dba.plotPCA(test,attributes=DBA_CONDITION, >>>>>vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhite 3","a >>>>>nti >>>>>quewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4"," brown >>>>>"," >>>>>coral")) >>>>>dev.off() >>>> >>>>>pdf('HeatMap.pdf') >>>>>corvals=dba.plotHeatmap(test) >>>>>dev.off() >>>> >>>>>pdf('HeatMap_RPKM_FOLD.pdf') >>>>>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD) >>>>>dev.off() >>>> >>>>> sessionInfo() >>>>R version 2.15.1 (2012-06-22) >>>>Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>>locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>[11] C_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>>attached base packages: >>>>[1] parallel stats graphics grDevices utils datasets methods >>>>[8] base >>>> >>>>other attached packages: >>>>[1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9 >>>>[4] IRanges_1.14.4 BiocGenerics_0.2.0 >>>> >>>>loaded via a namespace (and not attached): >>>> [1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 >>>>gplots_2.11.0 >>>> [5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 >>>>stats4_2.15.1 >>>> [9] tools_2.15.1 zlibbioc_1.2.0 >>>> >>>> >>>> >>>>Any sugestion is appreciated, >>>>cheers, >>>>paolo >>>> >>>> NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:17}}

ADD REPLY • link 12.7 years ago Gordon Brown ▴ 70

0

Entering edit mode

Hi, Alignment bed file and peak calling files are only 3 columns, chr, start and end. I don't think is a problem of my bed files, because the problem do not arise while I am using all my dataset for the analysis. >test=dba(sampleSheet="database.csv") >test >test=dba.count(test) >test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) >test=dba.analyze(test) the code works fine, but when I select only the overlapping peaks between my replicates: >test = dba.peakset(test, test$masks$CMN & test$masks$H3K4, >minOverlap=2, sampID="CMp_H3K4me3") >test = dba.peakset(test, test$masks$CMA & test$masks$H3K4, >minOverlap=2, sampID="CMa_H3K4me3") ......for all my samples.. and create a new dataset of my overlapping peaks >consdata = dba(test, mask = test$masks$Consensus) >consdata 12 Samples, 9754 sites in matrix (16143 total): >>> ID Tissue Factor Condition Replicate Intervals >>> 1 mES_H3K27me3 ES H3K27 mES_H3K27me3 1-2 1333 >>> 2 CMp_H3K27me3 CMN H3K27 CMp_H3K27me3 1-2 911 >>> 3 CMa_H3K27me3 CMA H3K27 CMa_H3K27me3 1-2 316 >>> 4 mES_H3K9me3 ES H3K9 mES_H3K9me3 1-2 89 >>> 5 CMp_H3K9me3 CMN H3K9 CMp_H3K9me3 1-2 86 >>> 6 CMa_H3K9me3 CMA H3K9 CMa_H3K9me3 1-2 323 >>> 7 mES_H3K79me2 ES H3K79 mES_H3K79me2 1-2 8590 >>> 8 CMp_H3K79me2 CMN H3K79 CMp_H3K79me2 1-2 4641 >>> 9 CMa_H3K79me2 CMA H3K79 CMa_H3K79me2 1-2 1084 >>> 10 mES_H3K4me3 ES H3K4 mES_H3K4me3 1-2 10018 >>> 11 CMp_H3K4me3 CMN H3K4 CMp_H3K4me3 1-2 8748 >>> 12 CMa_H3K4me3 CMA H3K4 CMa_H3K4me3 1-2 2998 >consdata = dba.count(consdata, minOverlap=1) I am prompted out R immediately to my working directory > terminate called after throwing an instance of 'std::out_of_range' > what(): basic_string::compare > Abort Paolo 2012/8/14 Gordon Brown <gordon.brown at="" cancer.org.uk="">: > Hi, Paolo, > > If it's crashing in dba.count, it either can't find the bed files of the > reads, or doesn't like their format. Most likely the latter. One > unfortunate restriction the release version has is that it expects bed > files to have either exactly 3 or 6 (or more) columns. If your bed files > have 4 or 5 columns, it'll crash trying to retrieve the (absent) strand > column. This bug is fixed in my latest development code (not yet > available), but not in the release version. Check to see if your bed > files have 4 or 5 columns. Otherwise, maybe there's some corruption in > the data files. Does it crash quickly, or does it take a while and then > crash? > > If it's the 4/5 column problem, let me know and I'll send you a patch. > Otherwise, I can whip up a little script that verifies the format of the > bed files, and send it to you. > > Cheers, > > - Gord > > > On 2012-08-14 15:51, "Paolo Kunderfranco" <paolo.kunderfranco at="" gmail.com=""> > wrote: > >>Hi, >>Any idea on how to solve this problem? I am stuck just at the end of >>the project, >>Cheers, >>Paolo >> >> >>> After I call dba.count I am prompted out R: >>> >>> terminate called after throwing an instance of 'std::out_of_range' >>> what(): basic_string::compare >>> Abort >>> >> >> >>2012/8/9 Paolo Kunderfranco <paolo.kunderfranco at="" gmail.com="">: >>> Dear Rory, >>> Thanks again for Your input, I went through and finally obtained >>> consensus peakset identified by both peak callers for >>> all the samples: >>> >>> >>> >>> rm(list=ls()) >>> library("DiffBind") >>> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >>> >>> test=dba(sampleSheet="database.csv") >>> #clustering based on occupancy/peak calling >>> pdf('clustering based on occupancy_peak calling.pdf') >>> dba.plotHeatmap(test) >>> dba.plotPCA(test) >>> dev.off() >>> >>> par(mfrow=c(1,3)) >>> >>> pdf('VENN_H3K27me3.pdf') >>> dba.plotVenn(test, test$masks$ES & >>> test$masks$H3K27,label1='mES_m',label2='mES_s') >>> dba.plotVenn(test, test$masks$CMN & >>> test$masks$H3K27,label1='CMp_m',label2='CMp_s') >>> dba.plotVenn(test, test$masks$CMA & >>> test$masks$H3K27,label1='CMa_m',label2='CMa_s') >>> dev.off() >>> pdf('VENN_H3K4me3.pdf') >>> dba.plotVenn(test, test$masks$ES & >>> test$masks$H3K4,label1='mES_m',label2='mES_s') >>> dba.plotVenn(test, test$masks$CMN & >>> test$masks$H3K4,label1='CMp_m',label2='CMp_s') >>> dba.plotVenn(test, test$masks$CMA & >>> test$masks$H3K4,label1='CMa_m',label2='CMa_s') >>> dev.off() >>> pdf('VENN_H3K9me3.pdf') >>> dba.plotVenn(test, test$masks$ES & >>> test$masks$H3K9,label1='mES_m',label2='mES_s') >>> dba.plotVenn(test, test$masks$CMN & >>> test$masks$H3K9,label1='CMp_m',label2='CMp_s') >>> dba.plotVenn(test, test$masks$CMA & >>> test$masks$H3K9,label1='CMa_m',label2='CMa_s') >>> dev.off() >>> pdf('VENN_H3K79me2.pdf') >>> dba.plotVenn(test, test$masks$ES & >>> test$masks$H3K79,label1='mES_m',label2='mES_s') >>> dba.plotVenn(test, test$masks$CMN & >>> test$masks$H3K79,label1='CMp_m',label2='CMp_s') >>> dba.plotVenn(test, test$masks$CMA & >>> test$masks$H3K79,label1='CMa_m',label2='CMa_s') >>> dev.off() >>> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >>> minOverlap=2, sampID="mES_H3K27me3") >>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >>> minOverlap=2, sampID="CMp_H3K27me3") >>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >>> minOverlap=2, sampID="CMa_H3K27me3") >>> pdf('Overlap_H3K27me3.pdf') >>> dba.plotVenn(test, test$masks$H3K27 & >>> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >>> dev.off() >>> test = dba.peakset(test, test$masks$ES & test$masks$H3K9, >>> minOverlap=2, sampID="mES_H3K9me3") >>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K9, >>> minOverlap=2, sampID="CMp_H3K9me3") >>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K9, >>> minOverlap=2, sampID="CMa_H3K9me3") >>> pdf('Overlap_H3K9me3.pdf') >>> dba.plotVenn(test, test$masks$H3K9 & >>> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >>> dev.off() >>> test = dba.peakset(test, test$masks$ES & test$masks$H3K79, >>> minOverlap=2, sampID="mES_H3K79me2") >>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K79, >>> minOverlap=2, sampID="CMp_H3K79me2") >>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K79, >>> minOverlap=2, sampID="CMa_H3K79me2") >>> pdf('Overlap_H3K79me2.pdf') >>> dba.plotVenn(test, test$masks$H3K79 & >>> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >>> dev.off() >>> test = dba.peakset(test, test$masks$ES & test$masks$H3K4, >>> minOverlap=2, sampID="mES_H3K4me3") >>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K4, >>> minOverlap=2, sampID="CMp_H3K4me3") >>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K4, >>> minOverlap=2, sampID="CMa_H3K4me3") >>> pdf('Overlap_H3K4me3.pdf') >>> dba.plotVenn(test, test$masks$H3K4 & >>> test$masks$Consensus,label1='mES',label2='CMp',label3='CMa') >>> dev.off() >>> >>> consdata = dba(test, mask = test$masks$Consensus) >>>> consdata >>> 12 Samples, 9754 sites in matrix (16143 total): >>> ID Tissue Factor Condition Replicate Intervals >>> 1 mES_H3K27me3 ES H3K27 mES_H3K27me3 1-2 1333 >>> 2 CMp_H3K27me3 CMN H3K27 CMp_H3K27me3 1-2 911 >>> 3 CMa_H3K27me3 CMA H3K27 CMa_H3K27me3 1-2 316 >>> 4 mES_H3K9me3 ES H3K9 mES_H3K9me3 1-2 89 >>> 5 CMp_H3K9me3 CMN H3K9 CMp_H3K9me3 1-2 86 >>> 6 CMa_H3K9me3 CMA H3K9 CMa_H3K9me3 1-2 323 >>> 7 mES_H3K79me2 ES H3K79 mES_H3K79me2 1-2 8590 >>> 8 CMp_H3K79me2 CMN H3K79 CMp_H3K79me2 1-2 4641 >>> 9 CMa_H3K79me2 CMA H3K79 CMa_H3K79me2 1-2 1084 >>> 10 mES_H3K4me3 ES H3K4 mES_H3K4me3 1-2 10018 >>> 11 CMp_H3K4me3 CMN H3K4 CMp_H3K4me3 1-2 8748 >>> 12 CMa_H3K4me3 CMA H3K4 CMa_H3K4me3 1-2 2998 >>> >>> consdata = dba.count(consdata, minOverlap=1) >>> >>> After I call dba.count I am prompted out R: >>> >>> terminate called after throwing an instance of 'std::out_of_range' >>> what(): basic_string::compare >>> Abort >>> >>> Where am I wrong? >>> Cheers, >>> Paolo >>> >>> >>> >>> >>> >>> >>> 2012/8/8 Rory Stark <rory.stark at="" cancer.org.uk="">: >>>> Hello Paolo- >>>> >>>> To examine the overlaps between peak callers, you need to be working >>>>with >>>> the peak (occupancy) data BEFORE you settle on a global peakset used >>>>for >>>> counting and subsequent analysis. In your example below, you have this >>>>in >>>> "test" after calling "dba" but before calling "dba.count". >>>> >>>> So after >>>> >>>>> test=dba(sampleSheet="database.csv") >>>> >>>> you can check clustering based on occupancy/peak calling: >>>> >>>>> dba,plotHeatmap(test) >>>>> dba.plotPCA(test) >>>> >>>> >>>> You can also plot Venn diagrams of each of the peak caller overlaps -- >>>>eg >>>> for the H3K27me3 mark, you could see the three sample overlaps as >>>>follows: >>>> >>>>> par(mfrow=c(1,3)) >>>>> dba.plotVenn(test, test$masks$ES & test$masks$H3K27) >>>>> dba.plotVenn(test, test$masks$CMN & test$masks$H3K27) >>>> >>>>> dba.plotVenn(test, test$masks$CMA & test$masks$H3K27) >>>> >>>> >>>> Note that specifying minOverlap=2 to dba.count doesn't limit the >>>>overlap >>>> to replicate peak callers. Rather, it creates a single set of peaks >>>>that >>>> are identified at least twice amongst all the peaksets: either by both >>>> peak callers for a single sample, or by the same peak caller in >>>>different >>>> samples, by different peak callers in different samples. If you want >>>>to >>>> use only peaks called by both peak callers for each sample,you can add >>>>a >>>> separate consensus peakset for eac condition, eg: >>>> >>>>> test = dba.peakset(test, test$masks$ES & test$masks$H3K27, >>>>>minOvrlap=2, ID="mES_H3K27me3") >>>>> test = dba.peakset(test, test$masks$CMN & test$masks$H3K27, >>>>>minOverlap=2, ID="CMp_H3K27e3") >>>> >>>>> test = dba.peakset(test, test$masks$CMA & test$masks$H3K27, >>>>>minOverlp=2, ID="CMa_H3K27me3") >>>> >>>> >>>> repeated 9 more times for a total of 12 pairs. You could if you like >>>>see >>>> the peak overlaps of the three H3K27me3 samples at this point: >>>> >>>>> dba.plotVenn(test, testmasks$H3K27 & test$masks$Consensus) >>>> >>>> Once you have added all 12 consensus peaksets, create a new DBA object >>>> containing only the consensus peaksets: >>>> >>>>> consdata = ba(test, mask = test$masks$Consensus) >>>> >>>> Then, if you want to use all the peaks identified by both peak callers >>>>for >>>> all the samples: >>>> >>>>> consdata = dba.count(consdata, minOverlap=1) >>>> >>>> And continue with the differential analysis as you had done. >>>> >>>> Cheers- >>>> Rory >>>> >>>>> >>>>>Dear All, >>>>>I am using package DiffBind, >>>>>Due to the fact that I miss real biological or technical replicates, I >>>>>am assesing the overlapping rates between different peak callers. >>>>>Here is how I set up my database to read in: >>>>> >>>>>3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions >>>>> >>>>>> rm(list=ls()) >>>>>> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/") >>>>>> test=dba(sampleSheet="database.csv") >>>>>> test >>>>>24 Samples, 19380 sites in matrix (40809 total): >>>>> ID Tissue Factor Condition Replicate Intervals >>>>>1 mES_H3K27me3_m ES H3K27 mES_H3K27me3 1 1834 >>>>>2 CMp_H3K27me3_m CMN H3K27 CMp_H3K27me3 1 2450 >>>>>3 CMa_H3K27me3_m CMA H3K27 CMa_H3K27me3 1 990 >>>>>4 mES_H3K27me3_s ES H3K27 mES_H3K27me3 2 2188 >>>>>5 CMp_H3K27me3_s CN H3K27 CMp_H3K27me3 2 3388 >>>>>6 CMa_H3K27me3_s CMA H3K27 CMa_H3K27me3 2 5946 >>>>>7 mES_H3K4me3_m ES H3K4 mES_H3K4me3 1 10243 >>>>>8 CMp_H3K4me3_m CMN H3K4 CMp_H3K4me3 1 12476 >>>>>9 CMa_H3K4me3_m CMA H3K4 CMa_H3K4me3 1 5632 >>>>>10 mES_H3K4m3_s ES H3K4 mES_H3K4me3 2 14917 >>>>>11 CMp_H3K4me3_s CMN H3K4 CMp_H3K4me3 2 10646 >>>>>12 CMa_H3K4me3_s CMA H3K4 CMa_H3K4me3 2 5985 >>>>>13 mES_H3K9me3_m ES H3K9 mES_H3K9me3 1 110 >>>>>14 CMp_H3K9me3_m CMN H3K9 CMp_H3K9me3 1 484 >>>>>15 CMa_H3K9me3_m CMA H3K9 CMa_H3K9me3 1 938 >>>>>16 mES_H3K9me3_s ES H3K9 mES_H3K9me3 2 569 >>>>>17 CMp_H3K9me3_s CMN H3K9 CMp_H3K9me3 2 808 >>>>>18 CMa_H3K9me3_s CMA H3K9 CMa_H3K9me3 2 3747 >>>>>19 mES_H3K79me2_m ES H3K79 mES_H3K79me2 1 21318 >>>>>20 CMp_H3K79me2_m CMN H3K79 CMp_H3K79me2 1 13210 >>>>>21 CMa_H3K79me2_m CMA H3K79 CMa_H3K79me2 1 2988 >>>>>22 mES_H3K79me2_s ES H3K79 mES_H3K79me2 2 22164 >>>>>23 CMp_H3K79me2_s CMN H3K79 CMp_H3K79me2 2 13429 >>>>>24 CMa_H3K79me2_s CMA H3K79 CMa_H3K79me2 2 6063 >>>>> >>>>> >>>>>> test=dba.count(test) >>>>> >>>>>> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2) >>>>>#minMemebers was se to 2 due to >>>>>the fact I have 2 replicates for each condition >>>>>#....78 Contrasts: >>>>> >>>>>> test = dba.analyze(test) >>>>> >>>>>#for each condition generate a report >>>>>>test.DB1=dba.report(test,contrast=1) >>>>> >>>>> >>>>>I would like now to assess overlapping rates between replicates, and >>>>>generate a venn diagram. >>>>>but it seems like replicates are merged together and I cannot treat >>>>>them as single samples. >>>>>I noticed this behaviour also when i try to generate both a PCA plot >>>>>and a heatmap plot, replicates are merged, so >>>>>what's the utility to display both in the PCA or heatmap plot? >>>>>would it be simple to display the original replicates to assess how do >>>>>they cluster? >>>>> >>>>>>pdf('PCA_plot_DBA_CONDITION.pdf') >>>>>>dba.plotPCA(test,attributes=DBA_CONDITION, >>>>>>vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhit e3","a >>>>>>nti >>>>>>quewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4", "brown >>>>>>"," >>>>>>coral")) >>>>>>dev.off() >>>>> >>>>>>pdf('HeatMap.pdf') >>>>>>corvals=dba.plotHeatmap(test) >>>>>>dev.off() >>>>> >>>>>>pdf('HeatMap_RPKM_FOLD.pdf') >>>>>>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD) >>>>>>dev.off() >>>>> >>>>>> sessionInfo() >>>>>R version 2.15.1 (2012-06-22) >>>>>Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> >>>>>locale: >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=C LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>[11] C_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>>attached base packages: >>>>>[1] parallel stats graphics grDevices utils datasets methods >>>>>[8] base >>>>> >>>>>other attached packages: >>>>>[1] DiffBind_1.2.0 Biobase_2.16.0 GenomicRanges_1.8.9 >>>>>[4] IRanges_1.14.4 BiocGenerics_0.2.0 >>>>> >>>>>loaded via a namespace (and not attached): >>>>> [1] amap_0.8-7 edgeR_2.6.10 gdata_2.11.0 >>>>>gplots_2.11.0 >>>>> [5] gtools_2.7.0 limma_3.12.1 RColorBrewer_1.0-5 >>>>>stats4_2.15.1 >>>>> [9] tools_2.15.1 zlibbioc_1.2.0 >>>>> >>>>> >>>>> >>>>>Any sugestion is appreciated, >>>>>cheers, >>>>>paolo >>>>> >>>>> > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. > > We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. > Cancer Research UK > Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) > A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). > Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.

ADD REPLY • link 12.7 years ago Paolo Kunderfranco ▴ 350