Question

GenomicAlignments - Speeding up readGAlignmentPairs()

0

Entering edit mode

Fong Chun Chan ▴ 320

@fong-chun-chan-5706

Last seen 10.6 years ago

Hi, I've been using the GenomicFeatures R package to read in some RNA-seq paired-end read data. While the readGAlignments() function reads in the bam file within a minute, I've noticed that the readGAlignmentPairs() function is extremely slow. This is even after restricting the space to just a single chromosome along with setting the flags isDuplicate = FALSE and isNotPrimaryRead = FALSE (the latter being suggested in the reference): "An easy way to avoid this degradation is to load only primary alignments by setting the isNotPrimaryRead ï¬ag to FALSE in ScanBamParam()" Based on my understanding, it seems that the ï¬ndMateAlignment() function has to be run first. Is there any other way to speed it up? Perhaps I am doing something wrong here? Code and sessionInfo() below. Thanks! ---- library("GenomicFeatures") library('GenomicAlignments') library('Rsamtools') si <- seqinfo(BamFile(arguments$bamFile)) gr <- GRanges(arguments$chr, IRanges(100, seqlengths(si)[[arguments$chr]]-100)) allExons <- exons(txdb, columns = c('gene_id', 'exon_id', 'exon_name')) scf <- scanBamFlag( isDuplicate = FALSE, isNotPrimaryRead = FALSE ) # remove duplicate reads, remove non-primary reads (i.e. multi-aligned reads) reads <- readGAlignmentPairs( arguments$bamFile, param = ScanBamParam( which = gr, flag = scf ) ); # grab reads in specific region > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] argparse_1.0.1 proto_0.3-10 GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.1 Biostrings_2.32.0 XVector_0.4.0 GenomicFeatures_1.16.2 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.9 BiocGenerics_0.10.0 vimcom.plus_0.9-93 [16] setwidth_1.0-3 colorout_1.0-2 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.7 BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 checkmate_1.0 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 fail_1.2 findpython_1.0.1 foreach_1.4.2 getopt_1.20.0 iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 RCurl_1.95-4.1 [19] rjson_0.2.14 RSQLite_0.11.4 rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 tools_3.1.0 XML_3.98-1.1 zlibbioc_1.10.0 > [[alternative HTML version deleted]]

GenomicFeatures GenomicFeatures • 1.4k views

ADD COMMENT • link 10.8 years ago Fong Chun Chan ▴ 320