VariantTools problem causing R crash
1
0
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 4 weeks ago
Italy

Dear all,

I stumbled over a problem calling tallyVariants on a BAM file from an ion torrent (which was most likely aligned using some proprietary tool) that causes R to crash after the message:

Two different genomic chars   and G at position 197880852

I believe this message comes from the gsnap/gmap code in gmapR, but am not sure. So, in order to better understand the cause of the problem I would like to pose some questions:

1) Does VariantTools require that BAM files were generated by gmap?

2) Is it required to use exactly the same gmap index for alignment and tallyVariants (this I obviously don't have, since I got the BAM file, but built my own gmap index using hg19 BSGenome package).

Thanks in advance for any insights.

cheers, jo

varianttools gmapr • 2.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

The answer is "no" to both of those questions. It would help to see your session info and if possible the fragment of the BAM file around that position (not sure which chr it is, but only chr1-3 are long enough to include it). Thanks!
 

ADD COMMENT
0
Entering edit mode

Getting the stuff from the BAM file might be tricky, in the meantime I will also try that on a different machine (eventually related to the OSX?).

my session info:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin15.0.0/x86_64 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] BiocParallel_1.4.0         Gviz_1.15.3               
 [3] EnsDb.Hsapiens.v81_0.99.12 ensembldb_1.3.7           
 [5] GenomicFeatures_1.22.5     AnnotationDbi_1.32.0      
 [7] RColorBrewer_1.1-2         ascii_2.1                 
 [9] gmapR_1.13.3               VariantTools_1.12.0       
[11] VariantAnnotation_1.16.3   Rsamtools_1.22.0          
[13] Biostrings_2.38.2          XVector_0.10.0            
[15] SummarizedExperiment_1.0.1 GenomicRanges_1.22.1      
[17] GenomeInfoDb_1.6.1         IRanges_2.4.4             
[19] S4Vectors_0.8.3            Biobase_2.30.0            
[21] BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.2                  biovizBase_1.18.0           
 [3] lattice_0.20-33              digest_0.6.8                
 [5] mime_0.4                     R6_2.1.1                    
 [7] plyr_1.8.3                   acepack_1.3-3.3             
 [9] futile.options_1.0.0         RSQLite_1.0.0               
[11] httr_1.0.0                   ggplot2_1.0.1               
[13] BiocInstaller_1.20.1         zlibbioc_1.16.0             
[15] rpart_4.1-10                 Matrix_1.2-3                
[17] proto_0.3-10                 splines_3.2.2               
[19] foreign_0.8-66               AnnotationHub_2.2.2         
[21] stringr_1.0.0                RCurl_1.95-4.7              
[23] biomaRt_2.26.1               munsell_0.4.2               
[25] shiny_0.12.2                 httpuv_1.3.3                
[27] rtracklayer_1.30.1           htmltools_0.2.6             
[29] nnet_7.3-11                  gridExtra_2.0.0             
[31] interactiveDisplayBase_1.8.0 Hmisc_3.17-0                
[33] matrixStats_0.15.0           XML_3.98-1.3                
[35] GenomicAlignments_1.6.1      MASS_7.3-45                 
[37] bitops_1.0-6                 xtable_1.8-0                
[39] gtable_0.1.2                 DBI_0.3.1                   
[41] magrittr_1.5                 scales_0.3.0                
[43] stringi_1.0-1                reshape2_1.4.1              
[45] latticeExtra_0.6-26          futile.logger_1.4.1         
[47] Formula_1.2-1                lambda.r_1.1.7              
[49] tools_3.2.2                  dichromat_2.0-0             
[51] BSgenome_1.38.0              survival_2.38-3             
[53] colorspace_1.2-6             cluster_2.0.3               

 

ADD REPLY
0
Entering edit mode

What do you mean by "related to the OSX"? VariantTools is not supported on OSX, only Linux.

ADD REPLY
0
Entering edit mode

Ah, didn't know that. I'm always compiling my R on OS X and thus also installing the source packages, not the pre-compiled ones. VariantTools compiled nicely on my system. I'll check if I get the same error on linux.

 

ADD REPLY
0
Entering edit mode

I guess it's unsupported in the sense that Bioc does not distribute a binary. But it builds and runs just fine on my Mac and on others, apparently.

ADD REPLY
0
Entering edit mode

Hi Michael,

Just to clarify, we don't support VariantTools on OS X because it depends on gmapR, which we don't support either because you asked us to mark it as unsupported on OS X (see commit 77246). Should we change that?

Thanks,

H.

ADD REPLY
0
Entering edit mode

I guess we could try again to get things building on your side. I looked back at the thread. It was the old autoconf via svn issue, which we solved with "--disable-maintainer-mode".

ADD REPLY
0
Entering edit mode

Done (in devel only, commit 111229). Note that you have full control on this via the .BBSoptions file in gmapR top-level directory.

H.

ADD REPLY
0
Entering edit mode

Thanks. I know I could change it, but it might take some of Dan's time, so wanted to coordinate.

ADD REPLY
0
Entering edit mode

I must be missing something. Is there something that needs to be done on the Mac build nodes?

ADD REPLY
0
Entering edit mode

I said "might". Hopefully not ;)

ADD REPLY
0
Entering edit mode

Could you at least look at the reads at that region to see if there is anything suspicious, like in the cigar string? Also, I realize gmapR is temporarily broken in devel, but if you could try grabbing the latest version from svn, that would help.

ADD REPLY
0
Entering edit mode

Hi Michael,

that seems to be the problematic read from the BAM file:

V3QJ7:01079:08231    0    chr1    197880592    94    278M    *    0    0    AAAGACCACCCCCATCCCCTAGCCGGGTCCCGGGAGGAAGAGATGCCCCGGCTCTCTCGTTGACCCGGCCTCGAACGCCGACCTCGAGCGCAGAATACTCCCGAGTGCTGGGTTCCTGATACCCAGCCCGGGAGCGAGGGGAGTGAAGGAGGGACCGGGACAGGAAGGCGGGGACGCGGTGGGTGAACAATCAATGGCTTAGCCTAGTTTCCCCGCAGGAGCTACCGCCCATTCACCGTCTATGCAAGCACCAGCTGGCGGCAGCACCTTTTAGGTAC    9:4:<;7=====.;8877,7---266)--91<<1A<3=,//8888:@@5>8:::<:;:8808;<<1;7>387:<7///7/..597888:9<<::5885<>>5<<=::::::387;7;:98;;>7>;;;5;=7<:;;:;;;2:::;:484:99/8:5994:;:;7:7;6888818:><A8>>=77785;:4::;6::4:<7<<<8=<<<;*888*8;;<187;::<5877*78377:6999:::::;7;;;<7<<<:;6870799954-477-78379ZP:B:f,0.00709133,0.0034905,7.86591e-06    ZA:i:278    ZG:i:445    ZB:i:30    ZC:B:i,445,444,1,0    ZM:B:s,270,-8,284,6,-48,262,28,252,200,26,-14,38,246,482,-10,0,482,-6,494,12,418,236,194,232,248,10,-12,692,24,14,270,-24,-48,286,472,6,-46,258,1232,-14,-52,42,8,44,300,16,20,4,208,952,24,-28,200,74,50,242,-72,144,-34,-6,446,6,690,34,122,-22,676,732,-50,232,56,518,12,-50,0,40,474,358,-38,300,-14,14,246,290,194,6,294,48,34,24,-38,-26,978,-16,470,250,206,4,218,8,236,24,272,-18,208,276,22,216,56,34,22,36,402,-16,240,182,-14,780,480,62,-6,0,26,0,498,-16,72,-12,176,20,222,244,34,472,298,192,2,474,70,218,152,-20,478,8,236,246,192,254,-8,20,210,0,-28,22,-48,-32,206,28,274,264,-18,254,2,246,-56,440,8,8,220,94,22,32,340,100,256,74,238,696,242,226,2,20,256,2,220,198,22,-16,304,16,-8,30,226,8,32,790,464,-2,486,20,232,-10,18,274,202,4,-20,66,234,48,-24,184,-22,716,18,248,-46,252,2,26,690,38,698,76,-26,204,44,332,-80,92,234,290,-18,52,-40,56,256,1002,-12,244,-2,8,226,24,216,18,320,500,8,444,8,202,60,80,796,6,-140,234,458,772,-6,236,224,130,0,-8,-4,18,254,502,58,466,-82,12,556,48,-12,244,872,184,-4,130,-8,64,254,0,330,212,10,84,28,500,226,10,34,682,216,-4,-20,200,432,12,178,488,198,252,92,450,216,14,508,66,68,-18,2,-38,212,-2,-48,36,434,264,-8,248,46,42,502,30,294,12,24,10,244,236,-10,38,662,974,236,46,18,266,62,234,60,410,-4,292,102,0,244,288,170,238,418,304,102,-16,766,28,2,28,18,58,284,40,144,-4,390,264,94,186,-28,498,194,64,228,38,-20,30,218,78,112,102,226,186,8,60,254,-16,106,186,10,214,-38,-10,430,212,186,198,4,438,32,242,-16,56,226,52,68,-4,30,32,288,16,62,134,230,92,60,432,26,26,224,364,34,226,34,102,294,172,312,224,102,362,24,-12,902,10,100,286,56,456,160,190,248,206,-4,14,340,38,252,106,42,212,468,278,-4,60,-46,-34    ZF:i:27    RG:Z:V3QJ7.IonXpress_012    MD:Z:2T0C262T11    NM:i:3    AS:i:266    XM:i:278    XA:Z:map4-1    XS:i:-2147483647    PG:Z:tmap.1

there is no other read even close to that one, so it must be this. I'm now trying the svn version gmapR.

with the latest gmapR from svn I get the following message:

> tallyVariants(bamSomma[1], talVarPar, BPPARAM=bpp)
Error: C stack usage  7970908 is too close to the limit

I'll try to recompile R on my Mac and run the code on a linux machine.

So, it did work on the linux machine, thus I guess it has something to do with my R/OSX. I'll check some more stuff.

Below is the session info from the linux machine:

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] VariantTools_1.12.0        VariantAnnotation_1.16.3  
 [3] Rsamtools_1.22.0           Biostrings_2.38.2         
 [5] XVector_0.10.0             SummarizedExperiment_1.0.1
 [7] Biobase_2.30.0             gmapR_1.12.0              
 [9] GenomicRanges_1.22.1       GenomeInfoDb_1.6.1        
[11] IRanges_2.4.4              S4Vectors_0.8.3           
[13] BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.0    GenomicAlignments_1.6.1 zlibbioc_1.16.0        
 [4] BiocParallel_1.4.0      lattice_0.20-33         BSgenome_1.38.0        
 [7] tools_3.2.2             grid_3.2.2              DBI_0.3.1              
[10] lambda.r_1.1.7          futile.logger_1.4.1     Matrix_1.2-3           
[13] rtracklayer_1.30.1      futile.options_1.0.0    bitops_1.0-6           
[16] RCurl_1.95-4.7          biomaRt_2.26.1          RSQLite_1.0.0          
[19] GenomicFeatures_1.22.5  XML_3.98-1.3  
ADD REPLY
1
Entering edit mode

I tried using your BAM with a freshly built genome index from the hg19 BSgenome package. It seemed to work, except for the mismatch in chrM length between the UCSC hg19 and your BAM's header. Note that this was on Linux, but with the latest devel of gmapR.

ADD REPLY
0
Entering edit mode

thanks for testing Michael. In Linux with the stable version it did also work for me. I'm still waiting to test the devel version on Mac, but there is some problem with VariantAnnotation that prevents me from installing it. I guest there must be something odd with my R installation...

Yes, the difference in chrM length is also strange, but that's the file I got from my collab partners.

ADD REPLY
0
Entering edit mode

The linux machine is using the stable release version of gmapR/bam_tally. We've made many changes to the devel gmapR. It would be great to get a hold of a BAM that causes this problem. Or at the very least the code constructing talVarPar.

ADD REPLY
0
Entering edit mode

UPDATE:

I've now compiled and installed R-3.2.3 and installed Bioc 3.2 on that fresh install (that used XCode 7.2, more specifically, clang from Apple's LLVM version 7.0.2; gfortran comes from homebrew in gcc version 5.4.0). With that install tallyVariants did run smoothly.

ADD REPLY

Login before adding your answer.

Traffic: 951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6