DiffBind error: Bad Magic number
1
0
Entering edit mode
@tassasaldi-8871
Last seen 9.3 years ago
United States

 Hello everyone,

I am having problems running diffbind count function. I keep getting the following error (running in R on Mac):

> H3K9me3 = dba.count(H3K9me3, minOverlap=2)
Error: Error processing one or more read files. Check warnings().
In addition: Warning messages:
1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) :
all scheduled cores encountered errors in user code
2: file '~/Desktop/H3K9me3_bam/Wt1_H2K9me3_chip_sorted.bam.bam' does not appear to be a BAM file (bad magic number) 

I checked the numerical code at the end of my bam file and it appears to be correct:

00029de0 7f 02 32 f0 24 b6 00 ed 00 00 1f 8b 08 04 00 00 |..2.$...........|
00029df0 00 00 00 ff 06 00 42 43 02 00 1b 00 03 00 00 00 |......BC........|
00029e00 00 00 00 00 00 00 |......|
00029e06

I have sorted and resorted with sam tools with no change. The bam file works fine for loading into IGV and such, so it is not unreadable. Any ideas?   Thanks!

sessionInfo() 

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] DiffBind_1.14.6         RSQLite_1.0.0          
 [3] DBI_0.3.1               locfit_1.5-9.1         
 [5] GenomicAlignments_1.4.1 Rsamtools_1.20.4       
 [7] Biostrings_2.36.4       XVector_0.8.0          
 [9] limma_3.24.15           GenomicRanges_1.20.8   
[11] GenomeInfoDb_1.4.3      IRanges_2.2.7          
[13] S4Vectors_0.6.6         BiocGenerics_0.14.0    
[15] BiocInstaller_1.18.4   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1            lattice_0.20-33       
 [3] GO.db_3.1.2            gtools_3.5.0          
 [5] digest_0.6.8           plyr_1.8.3            
 [7] futile.options_1.0.0   BatchJobs_1.6         
 [9] ShortRead_1.26.0       ggplot2_1.0.1         
[11] gplots_2.17.0          zlibbioc_1.14.0       
[13] annotate_1.46.1        gdata_2.17.0          
[15] Matrix_1.2-2           checkmate_1.6.2       
[17] systemPipeR_1.2.23     proto_0.3-10          
[19] GOstats_2.34.0         splines_3.2.2         
[21] BiocParallel_1.2.21    stringr_1.0.0         
[23] pheatmap_1.0.7         munsell_0.4.2         
[25] sendmailR_1.2-1        base64enc_0.1-3       
[27] BBmisc_1.9             fail_1.2              
[29] edgeR_3.10.2           XML_3.98-1.3          
[31] AnnotationForge_1.10.1 MASS_7.3-44           
[33] bitops_1.0-6           grid_3.2.2            
[35] RBGL_1.44.0            xtable_1.7-4          
[37] GSEABase_1.30.2        gtable_0.1.2          
[39] magrittr_1.5           scales_0.3.0          
[41] graph_1.46.0           KernSmooth_2.23-15    
[43] amap_0.8-14            stringi_0.5-5         
[45] hwriter_1.3.2          reshape2_1.4.1        
[47] genefilter_1.50.0      latticeExtra_0.6-26   
[49] futile.logger_1.4.1    brew_1.0-6            
[51] rjson_0.2.15           lambda.r_1.1.7        
[53] RColorBrewer_1.1-2     tools_3.2.2           
[55] Biobase_2.28.0         Category_2.34.2       
[57] survival_2.38-3        AnnotationDbi_1.30.1  
[59] colorspace_1.2-6       caTools_1.17.1      
diffbind • 1.9k views
ADD COMMENT
0
Entering edit mode

Hi,

The BAM format requires that the *first* 4 bytes of a legal BAM file (after uncompressing) be 'BAM\1'.  All BAMs I've used have it.  You can check yours like so:

> gunzip -c myfile.bam | head -1

The first 3 characters should be "BAM" (the next will probably display as an "r", but it's really ASCII 1).  If you see something else, it's not a legal BAM file.  It's possible that samtools doesn't check for the magic number, though.

Let me know what you get when you try.

 - Gord

ADD REPLY
0
Entering edit mode

Does the simpler task of reading the bam file succeed, e.g.,

library(Rsamtools)
seqinfo(BamFile("~/Desktop/H3K9me3_bam/Wt1_H2K9me3_chip_sorted.bam.bam"))

(is that the actual name of the bam file, with '.bam.bam'? Does it help to expand the home directory '~/'?) or

library(GenomicAlignments)
bf = BamFile("~/Desktop/H3K9me3_bam/Wt1_H2K9me3_chip_sorted.bam.bam", 
    yieldSize=1000)
readGAlignments(bf)
ADD REPLY
0
Entering edit mode

Good point, I can almost guarantee that tilde expansion isn't supported by my code.  So try with an absolute path.  The little bit of code that checks for the magic number could be giving the wrong error message, if it can't read the file at all.  (I'll check and update the code if so.)

ADD REPLY
0
Entering edit mode

So yes, it turns out that gzopen (which my code uses to check for the magic number) doesn't do tilde expansion, and my code incorrectly produces the wrong error.  So absolute paths should fix the problem.  I'll fix this bug for the next release.  Sorry for the inconvenience.

ADD REPLY
0
Entering edit mode

Over on SEQanswers, where this thread originated, I suggested trying setting bUseSummarizeOverlaps=TRUE, which uses the standard GenomicAlignments functions to do the counting (BamFileList, ScanBamParam, and summarizeOverlaps) . It worked fine using those functions, so our code must be handling it differently.

-R

ADD REPLY
0
Entering edit mode
Gord Brown ▴ 670
@gord-brown-5664
Last seen 4.0 years ago
United Kingdom

The answer is in the comments above, but for the sake of completeness...

The problem is that my code does not correctly handle the tilde in the file path, but produces the wrong error message instead of just telling you it can't find the file.  A bug fix will be added for the next release.  The workaround is to replace the tilde with an absolute path (or a path relative to the current directory of the R session).

ADD COMMENT

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6