Affect ShortRead readAligned behavior with ScanBamParam's reverseComplement parameter
2
0
Entering edit mode
jdavison ▴ 10
@jdavison-7081
Last seen 8.8 years ago
United States

Hi, I thought one can affect how ShortRead's readAligned function handles reverse strand BAM reads: to reverse complement them, or not, with the ScanBamParam reverseComplement.  I'm not able to do that, can you set me straight?  Thank you, see the code below.


library(ShortRead)

b = readAligned('my.bam', type='BAM')
sread(b)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

# reverseComplement: A logical(1) vectors. BAM files store reads mapping
#          to the minus strand as though they are on the plus strand.
#          Rsamtools obeys this convention by default
#          (‘reverseComplement=FALSE’), but when this value is set to
#          TRUE returns the sequence and quality scores of reads mapped
#          to the minus strand in the reverse complement (sequence) and
#          reverse (quality) of the read as stored in the BAM file. This
#          might be useful if wishing to recover read and quality scores
#          as represented in fastq files, but is NOT appropriate for
#          variant calling or other alignment-based operations.

param.T = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
     what = ShortRead:::.readAligned_bamWhat())

b.SBP.TRUE = readAligned('my.bam', type='BAM', param=param.T)

sread(b.SBP.TRUE)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

param.F = ScanBamParam(simpleCigar = TRUE, reverseComplement = FALSE,
     what = ShortRead:::.readAligned_bamWhat())

b.SBP.FALSE = readAligned('my.bam', type='BAM', param=param.F)
Warning message:
UserArgumentMismatch
  using 'TRUE' for 'bamReverseComplement(param)' 

sread(b.SBP.FALSE)
[1]   151 CATTGCTTCCGTTTGTGCTCGATAAAAATAAGAAT...GTCTGGTTCATCATCTGTAAGAATGGCTTCCAGAG

> version
               _                                                 
platform       x86_64-unknown-linux-gnu                          
arch           x86_64                                            
os             linux-gnu                                         
system         x86_64, linux-gnu                                 
status         Under development (unstable)                      
major          3                                                 
minor          2.0                                               
year           2015                                              
month          01                                                
day            11                                                
svn rev        67421                                             
language       R                                                 
version.string R Under development (unstable) (2015-01-11 r67421)
nickname       Unsuffered Consequences

                          
> sessionInfo()
R Under development (unstable) (2015-01-11 r67421)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] ShortRead_1.25.8         GenomicAlignments_1.3.23 Rsamtools_1.19.26       
 [4] GenomicRanges_1.19.33    GenomeInfoDb_1.3.12      Biostrings_2.35.7       
 [7] XVector_0.7.3            IRanges_2.1.35           S4Vectors_0.5.16        
[10] BiocParallel_1.1.11      BiocGenerics_0.13.4      BiocInstaller_1.17.3    

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2     BatchJobs_1.5       BBmisc_1.8         
 [4] Biobase_2.27.1      bitops_1.0-6        brew_1.0-6         
 [7] checkmate_1.5.1     codetools_0.2-9     DBI_0.3.1          
[10] digest_0.6.8        fail_1.2            foreach_1.4.2      
[13] grid_3.2.0          hwriter_1.3.2       iterators_1.0.7    
[16] lattice_0.20-29     latticeExtra_0.6-26 RColorBrewer_1.1-2 
[19] RSQLite_1.0.0       sendmailR_1.2-1     stringr_0.6.2      
[22] tools_3.2.0         zlibbioc_1.13.0    

 

shortread readAligned reverseComplement ScanBamParam • 1.9k views
ADD COMMENT
0
Entering edit mode

Oops, I forgot to note

b = readAligned('my.bam', type='BAM')

strand(b)

[1] -
Levels: + - *

 

Also, I can get what I want with e.g.

bsr = sread(b)

ndx = strand(b)=='-' & !is.na(strand(b))

bsr[ndx] = reverseComplement(bsr[ndx])

 

 

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Use GenomicAlignments::readGAlignments() instead; readAligned is really only for legacy formats.

> fl = system.file(package="Rsamtools", "extdata", "ex1.bam")
> readGAlignments(fl, param=ScanBamParam(what="seq"))[9]
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end
         <Rle>  <Rle> <character> <integer> <integer> <integer>
  [1]     seq1      -         35M        35        18        52
          width     njunc |                                 seq
      <integer> <integer> |                      <DNAStringSet>
  [1]        35         0 | AAATGTGTGGTTTAACTCGTCCATGGCCCAGCATT
  -------
  seqinfo: 2 sequences from an unspecified genome
> readGAlignments(fl, param=ScanBamParam(what="seq", reverseComplement=TRUE))[9]
GAlignments object with 1 alignment and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end
         <Rle>  <Rle> <character> <integer> <integer> <integer>
  [1]     seq1      -         35M        35        18        52
          width     njunc |                                 seq
      <integer> <integer> |                      <DNAStringSet>
  [1]        35         0 | AATGCTGGGCCATGGACGAGTTAAACCACACATTT
  -------
  seqinfo: 2 sequences from an unspecified genome
ADD COMMENT
0
Entering edit mode
jdavison ▴ 10
@jdavison-7081
Last seen 8.8 years ago
United States

Sweet, thanks.  And I see you have videos, excellent!

http://bioconductor.org/packages/release/bioc/html/GenomicAlignments.html

 

ADD COMMENT

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6