Reading alignments above a mapq threshold from BAM files
2
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 29 days ago
Italy

dear All!

I was wondering whether there is a possibility to read only alignments from a BAM file passing a certain mapq threshold. I searched for it but didn't find anything obvious. Currently I'm using the filterBam function to save only the high quality alignments to another BAM file, but that is quite time and disk-space consuming.

If there is no easy way, might that be something that could be added to the ScanBamParam? Something like a mapping quality filter?

cheers, jo

 

rsamtools bam • 2.5k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 10 hours ago
Seattle, WA, United States

Hi Johannes,

There might be some technical reasons for keeping the filter and param arguments separated but I'll leave this to the Rsamtools experts. In the meantime here is a modified version of scanBam() that supports the filter argument (like filterBam() does):

scanBam2 <- function(file, index=file, ...,
                           param=ScanBamParam(what=scanBamWhat()),
                           filter=FilterRules())
{
    res <- scanBam(file, index=index, ..., param=param)
    if (length(filter) == 0L) 
        return(res)
    lapply(res, function(x) {
        df <- DataFrame(x)
        df <- df[eval(filter, df), ]
        as.list(df)
    })
}

Used together with an appropriate yieldSize, it should perform a little better than the filterBam solution (at least it's no more disk-space consuming :-) ).

The same addition could be made to the readGAlignment*() functions.

Hope this helps,

H.

 

ADD COMMENT
0
Entering edit mode

Thanks Herve!

ADD REPLY
2
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

I added mapqFilter to ScanBamParam in the devel branch, Rsamtools version 1.21.12. A more consistent approach to filtering on input is required; these are relatively easy to implement but a bit tedious with the current design.

ADD COMMENT
0
Entering edit mode

The current development version of Rsamtools is 1.21.11; did you mean 1.21.12 in your post?

ADD REPLY
0
Entering edit mode

yep, thanks!

ADD REPLY
0
Entering edit mode

Thanks Martin!

ADD REPLY
0
Entering edit mode

Sorry! It was my mistake... I used params instead of param... with param it works.

cheers, jo

 

Dear Martin!

I get now however a strange error message when I try the mapqFilter in the ScanBamParam submitted to the summarizeOverlaps method. The code I used was:

sbp <- ScanBamParam(mapqFilter=mapq.cut)
bfiles <- BamFileList(BamFiles, index=paste0(BamFiles, ".bai"),
                      obeyQname=TRUE,
                      asMates=TRUE, yieldSize=1e+6)
GeneCounts <- summarizeOverlaps(features=EnsGenes , reads=bfiles[[1]],
                                mode="IntersectionStrict",
                                ignore.strand=TRUE, inter.feature=TRUE, 
                                singleEnd=FALSE, fragments=TRUE, params=sbp)

I get then the following error message:

Error in SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features,  :
  error in evaluating the argument 'assays' in selecting a method for function 'SummarizedExperiment': Error in validObject(.Object) :
  invalid class "SimpleList" object: invalid object for slot "listData" in class "SimpleList": got class "matrix", should be or extend class "list"

When I run the same code without the params everything is OK. I'll try to dig into the source code, but maybe you have a quick hint what might be wrong here (I could guess that eventually the mapq is not read?).

Thanks and best regards,

jo

 

my sessionInfo:

sessionInfo()
R Under development (unstable) (2015-06-17 r68530)
Platform: x86_64-apple-darwin14.4.0/x86_64 (64-bit)
Running under: OS X 10.10.4 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2         ascii_2.1                 
 [3] Rsubread_1.19.2            EnsDb.Hsapiens.v75_0.99.12
 [5] ensembldb_1.1.5            GenomicFeatures_1.21.13   
 [7] AnnotationDbi_1.31.17      SeqUtils_0.0.3            
 [9] ShortRead_1.27.4           GenomicAlignments_1.5.11  
[11] SummarizedExperiment_0.3.1 Rsamtools_1.21.12         
[13] GenomicRanges_1.21.16      GenomeInfoDb_1.5.8        
[15] Biostrings_2.37.2          XVector_0.9.1             
[17] IRanges_2.3.13             S4Vectors_0.7.9           
[19] BiocParallel_1.3.29        Biobase_2.29.1            
[21] BiocGenerics_0.15.3       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6                  compiler_3.3.0              
 [3] BiocInstaller_1.19.6         futile.logger_1.4.1         
 [5] AnnotationHub_2.1.28         bitops_1.0-6                
 [7] futile.options_1.0.0         tools_3.3.0                 
 [9] zlibbioc_1.15.0              digest_0.6.8                
[11] biomaRt_2.25.1               RSQLite_1.0.0               
[13] lattice_0.20-31              shiny_0.12.1                
[15] DBI_0.3.1                    rtracklayer_1.29.11         
[17] hwriter_1.3.2                httr_1.0.0                  
[19] stringr_1.0.0                grid_3.3.0                  
[21] R6_2.0.1                     XML_3.98-1.3                
[23] latticeExtra_0.6-26          GenomePlotR_0.1.5           
[25] lambda.r_1.1.7               magrittr_1.5                
[27] htmltools_0.2.6              xtable_1.7-4                
[29] mime_0.3                     interactiveDisplayBase_1.7.0
[31] httpuv_1.3.2                 stringi_0.5-5               
[33] RCurl_1.95-4.7      

 

ADD REPLY
0
Entering edit mode

I wasn't able to reproduce this. SummarizedExperiment and S4Vectors might be out of date (BiocInstaller::biocValid() for hints). traceback() might help.

ADD REPLY
0
Entering edit mode

Sorry, that was all my mistake. I eventually edited the question too late, but I did use params=sbp instead of param=sbp in the function call following the summarizeOverlaps vignette of the GenomicAlignments package, section 5.2 stating "The params argument...". Calling the function with param=sbp everything works nicely.

Thanks
 

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6