Question

DiffBind - error in dba.count

0

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 14 months ago

Germany

Hi again, I am trying DiffBind and loaded my data that looks like this: H3K4m3 4 Samples, 13203 sites in matrix (13792 total): ID Tissue Factor Condition Peak.caller Replicate Intervals 1 wt1 Hela H3K4me3 control1 raw 1 14111 2 wt2 Hela H3K4me3 control2 raw 2 13771 3 treat1 Hela H3K4me3 condition1 raw 1 14865 4 treat2 Hela H3K4me3 condition2 raw 2 13393 But I ran into problems trying to calculate the affinity scores with dba.count: H3K4m3 = dba.count(H3K4m3) Error in cond$counts : $ operator is invalid for atomic vectors In addition: Warning message: In mclapply(arglist, fn, ..., mc.preschedule = FALSE) : 6 function calls resulted in an error The peaks are in bed files (chr, start, end, score) and the reads are in SAM format. Can anyone help me with this? Cheers. António > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DiffBind_1.0.9 Biobase_2.14.0 loaded via a namespace (and not attached): [1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7 edgeR_2.4.6 [5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0 limma_3.10.3 [9] zlibbioc_1.0.1 > On 13 September 2012 18:06, António Miguel de Jesus Domingues < amjdomingues@gmail.com> wrote: > Hi all, > > I am trying to use DiffBind to compare peaks called in control vs > condition. I have 2 replicates for each and I've also called peaks using 2 > different peak callers (to wi, MACS and QuEST). I've also prepared a sample > data sheet that looks like this: > SampleID Tissue Factor Condition Replicate Peak.caller bamReads > bamControl Peaks > control Hela TF wt 1 > MACS path path path > control Hela TF wt 1 > QuEST path path path > control2 Hela TF wt 2 > MACS path path path > control 2 Hela TF wt 2 > QuEST path path path > (and the same for the conditions) > > My plan was to load all the data and then using diffbind selecte a set of > common peaks for the peak callers before proceeding with the analysis. > However, when I load the data (data = dba(sampleSheet="samplesheet.csv")) > the peaks for each caller are not recognized as a different variable. How > I can do that and is this silly? > > I could also derive a set of common peaks independently but it would be > neat to do it all with the same package and that seems to be possible but I > could not find how to do it in the documentation. > > Thanks, > António > > > -- > -- > António Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue@mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

Genetics DiffBind Genetics DiffBind • 1.6k views

ADD COMMENT • link updated 12.6 years ago by Gordon Brown ▴ 70 • written 12.6 years ago by António Miguel de Jesus Domingues ▴ 510

score 0 · Answer 1 · 2012-09-14

0

Entering edit mode

Gordon Brown ▴ 70

@gordon-brown-4877

Last seen 10.6 years ago

Hi, Ant?nio, DiffBind doesn't read SAM format, only BAM and (gzipped or uncompressed) BED. Try converting your SAM files to BAM. Cheers, - Gord P.S. "chr, start, end, score" isn't technically a legal bed file. You'd need "chr, start, end, NAME, score" for it to be a bed file. Though DiffBind may read them anyway (I didn't write that part...). On 2012-09-14 10:58, "Ant?nio Miguel de Jesus Domingues" <amjdomingues at="" gmail.com=""> wrote: >Hi again, > >I am trying DiffBind and loaded my data that looks like this: > >H3K4m3 >4 Samples, 13203 sites in matrix (13792 total): > ID Tissue Factor Condition Peak.caller Replicate Intervals >1 wt1 Hela H3K4me3 control1 raw 1 14111 >2 wt2 Hela H3K4me3 control2 raw 2 13771 >3 treat1 Hela H3K4me3 condition1 raw 1 14865 >4 treat2 Hela H3K4me3 condition2 raw 2 13393 > >But I ran into problems trying to calculate the affinity scores with >dba.count: > >H3K4m3 = dba.count(H3K4m3) >Error in cond$counts : $ operator is invalid for atomic vectors >In addition: Warning message: >In mclapply(arglist, fn, ..., mc.preschedule = FALSE) : > 6 function calls resulted in an error > >The peaks are in bed files (chr, start, end, score) and the reads are in >SAM format. > >Can anyone help me with this? > >Cheers. >Ant?nio > >> sessionInfo() >R version 2.14.1 (2011-12-22) >Platform: x86_64-pc-linux-gnu (64-bit) > >locale: >[1] C > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >[8] base > >other attached packages: >[1] DiffBind_1.0.9 Biobase_2.14.0 > >loaded via a namespace (and not attached): >[1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7 >edgeR_2.4.6 >[5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0 >limma_3.10.3 >[9] zlibbioc_1.0.1 >> > >On 13 September 2012 18:06, Ant?nio Miguel de Jesus Domingues < >amjdomingues at gmail.com> wrote: > >> Hi all, >> >> I am trying to use DiffBind to compare peaks called in control vs >> condition. I have 2 replicates for each and I've also called peaks >>using 2 >> different peak callers (to wi, MACS and QuEST). I've also prepared a >>sample >> data sheet that looks like this: >> SampleID Tissue Factor Condition Replicate Peak.caller >>bamReads >> bamControl Peaks >> control Hela TF wt 1 >> MACS path path path >> control Hela TF wt 1 >> QuEST path path path >> control2 Hela TF wt 2 >> MACS path path path >> control 2 Hela TF wt 2 >> QuEST path path path >> (and the same for the conditions) >> >> My plan was to load all the data and then using diffbind selecte a set >>of >> common peaks for the peak callers before proceeding with the analysis. >> However, when I load the data (data = >>dba(sampleSheet="samplesheet.csv")) >> the peaks for each caller are not recognized as a different variable. >>How >> I can do that and is this silly? >> >> I could also derive a set of common peaks independently but it would be >> neat to do it all with the same package and that seems to be possible >>but I >> could not find how to do it in the documentation. >> >> Thanks, >> Ant?nio >> >> >> -- >> -- >> Ant?nio Miguel de Jesus Domingues, PhD >> Neugebauer group >> Max Planck Institute of Molecular Cell Biology and Genetics, Dresden >> Pfotenhauerstrasse 108 >> 01307 Dresden >> Germany >> >> e-mail: domingue at mpi-cbg.de >> tel. +49 351 210 2481 >> The Unbearable Lightness of Molecular Biology >> > > > >-- >-- >Ant?nio Miguel de Jesus Domingues, PhD >Neugebauer group >Max Planck Institute of Molecular Cell Biology and Genetics, Dresden >Pfotenhauerstrasse 108 >01307 Dresden >Germany > >e-mail: domingue at mpi-cbg.de >tel. +49 351 210 2481 >The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:17}}

ADD COMMENT • link 12.6 years ago Gordon Brown ▴ 70

0

Entering edit mode

Hi Gordon, I thougth that might be the problem but the Vignette (20 March 2012) states that SAM could be used "Additionally, file containing mapped sequencing reads (BAM/SAM/BED) can be associated with each peakset". Not a problem though, converting to BAM is easy enough. P.S. "chr, start, end, score" isn't technically a legal bed file. You'd > need "chr, start, end, NAME, score" for it to be a bed file. Though > DiffBind may read them anyway (I didn't write that part...). Yes of course, I've completely forgot that. Cheers. António On 14 September 2012 12:28, Gordon Brown <gordon.brown@cancer.org.uk> wrote: > Hi, António, > > DiffBind doesn't read SAM format, only BAM and (gzipped or uncompressed) > BED. Try converting your SAM files to BAM. > > Cheers, > > - Gord > > P.S. "chr, start, end, score" isn't technically a legal bed file. You'd > need "chr, start, end, NAME, score" for it to be a bed file. Though > DiffBind may read them anyway (I didn't write that part...). > > > > On 2012-09-14 10:58, "António Miguel de Jesus Domingues" > <amjdomingues@gmail.com> wrote: > > >Hi again, > > > >I am trying DiffBind and loaded my data that looks like this: > > > >H3K4m3 > >4 Samples, 13203 sites in matrix (13792 total): > > ID Tissue Factor Condition Peak.caller Replicate Intervals > >1 wt1 Hela H3K4me3 control1 raw 1 14111 > >2 wt2 Hela H3K4me3 control2 raw 2 13771 > >3 treat1 Hela H3K4me3 condition1 raw 1 14865 > >4 treat2 Hela H3K4me3 condition2 raw 2 13393 > > > >But I ran into problems trying to calculate the affinity scores with > >dba.count: > > > >H3K4m3 = dba.count(H3K4m3) > >Error in cond$counts : $ operator is invalid for atomic vectors > >In addition: Warning message: > >In mclapply(arglist, fn, ..., mc.preschedule = FALSE) : > > 6 function calls resulted in an error > > > >The peaks are in bed files (chr, start, end, score) and the reads are in > >SAM format. > > > >Can anyone help me with this? > > > >Cheers. > >António > > > >> sessionInfo() > >R version 2.14.1 (2011-12-22) > >Platform: x86_64-pc-linux-gnu (64-bit) > > > >locale: > >[1] C > > > >attached base packages: > >[1] parallel stats graphics grDevices utils datasets methods > >[8] base > > > >other attached packages: > >[1] DiffBind_1.0.9 Biobase_2.14.0 > > > >loaded via a namespace (and not attached): > >[1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7 > >edgeR_2.4.6 > >[5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0 > >limma_3.10.3 > >[9] zlibbioc_1.0.1 > >> > > > >On 13 September 2012 18:06, António Miguel de Jesus Domingues < > >amjdomingues@gmail.com> wrote: > > > >> Hi all, > >> > >> I am trying to use DiffBind to compare peaks called in control vs > >> condition. I have 2 replicates for each and I've also called peaks > >>using 2 > >> different peak callers (to wi, MACS and QuEST). I've also prepared a > >>sample > >> data sheet that looks like this: > >> SampleID Tissue Factor Condition Replicate Peak.caller > >>bamReads > >> bamControl Peaks > >> control Hela TF wt 1 > >> MACS path path path > >> control Hela TF wt 1 > >> QuEST path path path > >> control2 Hela TF wt 2 > >> MACS path path path > >> control 2 Hela TF wt 2 > >> QuEST path path path > >> (and the same for the conditions) > >> > >> My plan was to load all the data and then using diffbind selecte a set > >>of > >> common peaks for the peak callers before proceeding with the analysis. > >> However, when I load the data (data = > >>dba(sampleSheet="samplesheet.csv")) > >> the peaks for each caller are not recognized as a different variable. > >>How > >> I can do that and is this silly? > >> > >> I could also derive a set of common peaks independently but it would be > >> neat to do it all with the same package and that seems to be possible > >>but I > >> could not find how to do it in the documentation. > >> > >> Thanks, > >> António > >> > >> > >> -- > >> -- > >> António Miguel de Jesus Domingues, PhD > >> Neugebauer group > >> Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > >> Pfotenhauerstrasse 108 > >> 01307 Dresden > >> Germany > >> > >> e-mail: domingue@mpi-cbg.de > >> tel. +49 351 210 2481 > >> The Unbearable Lightness of Molecular Biology > >> > > > > > > > >-- > >-- > >António Miguel de Jesus Domingues, PhD > >Neugebauer group > >Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > >Pfotenhauerstrasse 108 > >01307 Dresden > >Germany > > > >e-mail: domingue@mpi-cbg.de > >tel. +49 351 210 2481 > >The Unbearable Lightness of Molecular Biology > > > > [[alternative HTML version deleted]] > > > > > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for the above- named > person(s). If you are not the intended recipient, notify the sender > immediately, delete this email from your system and do not disclose or use > for any purpose. > > We may monitor all incoming and outgoing emails in line with current > legislation. We have taken steps to ensure that this email and attachments > are free from any virus, but it remains your responsibility to ensure that > viruses do not adversely affect you. > Cancer Research UK > Registered charity in England and Wales (1089464), Scotland (SC041666) and > the Isle of Man (1103) > A company limited by guarantee. Registered company in England and Wales > (4325234) and the Isle of Man (5713F). > Registered Office Address: Angel Building, 407 St John Street, London EC1V > 4AD. > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

ADD REPLY • link 12.6 years ago António Miguel de Jesus Domingues ▴ 510

0

Entering edit mode

Hi, I originally intended to include SAM support, but never got around to it. One of these days I will (or maybe just take it out of the documentation!). Cheers, - Gord From: António Miguel de Jesus Domingues <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> To: Gord Brown <gordon.brown@cancer.org.uk<mailto:gordon.brown@cancer.org.uk>> Cc: "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] DiffBind - error in dba.count Hi Gordon, I thougth that might be the problem but the Vignette (20 March 2012) states that SAM could be used "Additionally, file containing mapped sequencing reads (BAM/SAM/BED) can be associated with each peakset". Not a problem though, converting to BAM is easy enough. P.S. "chr, start, end, score" isn't technically a legal bed file. You'd need "chr, start, end, NAME, score" for it to be a bed file. Though DiffBind may read them anyway (I didn't write that part...). Yes of course, I've completely forgot that. Cheers. António On 14 September 2012 12:28, Gordon Brown <gordon.brown@cancer.org.uk<mailto:gordon.brown@cancer.org.uk>> wrote: Hi, António, DiffBind doesn't read SAM format, only BAM and (gzipped or uncompressed) BED. Try converting your SAM files to BAM. Cheers, - Gord P.S. "chr, start, end, score" isn't technically a legal bed file. You'd need "chr, start, end, NAME, score" for it to be a bed file. Though DiffBind may read them anyway (I didn't write that part...). On 2012-09-14 10:58, "António Miguel de Jesus Domingues" <amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: >Hi again, > >I am trying DiffBind and loaded my data that looks like this: > >H3K4m3 >4 Samples, 13203 sites in matrix (13792 total): > ID Tissue Factor Condition Peak.caller Replicate Intervals >1 wt1 Hela H3K4me3 control1 raw 1 14111 >2 wt2 Hela H3K4me3 control2 raw 2 13771 >3 treat1 Hela H3K4me3 condition1 raw 1 14865 >4 treat2 Hela H3K4me3 condition2 raw 2 13393 > >But I ran into problems trying to calculate the affinity scores with >dba.count: > >H3K4m3 = dba.count(H3K4m3) >Error in cond$counts : $ operator is invalid for atomic vectors >In addition: Warning message: >In mclapply(arglist, fn, ..., mc.preschedule = FALSE) : > 6 function calls resulted in an error > >The peaks are in bed files (chr, start, end, score) and the reads are in >SAM format. > >Can anyone help me with this? > >Cheers. >António > >> sessionInfo() >R version 2.14.1 (2011-12-22) >Platform: x86_64-pc-linux-gnu (64-bit) > >locale: >[1] C > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >[8] base > >other attached packages: >[1] DiffBind_1.0.9 Biobase_2.14.0 > >loaded via a namespace (and not attached): >[1] IRanges_1.12.6 RColorBrewer_1.0-5 amap_0.8-7 >edgeR_2.4.6 >[5] gdata_2.11.0 gplots_2.11.0 gtools_2.7.0 >limma_3.10.3 >[9] zlibbioc_1.0.1 >> > >On 13 September 2012 18:06, António Miguel de Jesus Domingues < >amjdomingues@gmail.com<mailto:amjdomingues@gmail.com>> wrote: > >> Hi all, >> >> I am trying to use DiffBind to compare peaks called in control vs >> condition. I have 2 replicates for each and I've also called peaks >>using 2 >> different peak callers (to wi, MACS and QuEST). I've also prepared a >>sample >> data sheet that looks like this: >> SampleID Tissue Factor Condition Replicate Peak.caller >>bamReads >> bamControl Peaks >> control Hela TF wt 1 >> MACS path path path >> control Hela TF wt 1 >> QuEST path path path >> control2 Hela TF wt 2 >> MACS path path path >> control 2 Hela TF wt 2 >> QuEST path path path >> (and the same for the conditions) >> >> My plan was to load all the data and then using diffbind selecte a set >>of >> common peaks for the peak callers before proceeding with the analysis. >> However, when I load the data (data = >>dba(sampleSheet="samplesheet.csv")) >> the peaks for each caller are not recognized as a different variable. >>How >> I can do that and is this silly? >> >> I could also derive a set of common peaks independently but it would be >> neat to do it all with the same package and that seems to be possible >>but I >> could not find how to do it in the documentation. >> >> Thanks, >> António >> >> >> -- >> -- >> António Miguel de Jesus Domingues, PhD >> Neugebauer group >> Max Planck Institute of Molecular Cell Biology and Genetics, Dresden >> Pfotenhauerstrasse 108 >> 01307 Dresden >> Germany >> >> e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> >> tel. +49 351 210 2481<tel:%2b49%20351%20210%202481> >> The Unbearable Lightness of Molecular Biology >> > > > >-- >-- >António Miguel de Jesus Domingues, PhD >Neugebauer group >Max Planck Institute of Molecular Cell Biology and Genetics, Dresden >Pfotenhauerstrasse 108 >01307 Dresden >Germany > >e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> >tel. +49 351 210 2481<tel:%2b49%20351%20210%202481> >The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for the above- named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose. We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you. Cancer Research UK Registered charity in England and Wales (1089464), Scotland (SC041666) and the Isle of Man (1103) A company limited by guarantee. Registered company in England and Wales (4325234) and the Isle of Man (5713F). Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD. [[alternative HTML version deleted]]

ADD REPLY • link 12.6 years ago Gordon Brown ▴ 70