Excluding reads with a certain tag using tagFilter from Rsamtools
1
0
Entering edit mode
@lizingsimmons-6935
Last seen 4.0 years ago
Germany

Hi folks,

I want to use countBam from Rsamtools to count the number of lines in a bam file matching certain criteria - mapping quality and not having the "XA" tag. On the command line, I would do this as follows:

samtools view -q 3 my_file_sort.bam | grep -v "XA:" | wc -l

However, I don't think there is a way to exclude tags using Rsamtools, since the help for ScanBamParam says:

tagFilter A named list of atomic vectors. The name of each list element is the tag name (two-letter code), and the corresponding atomic vector is the set of acceptable values for the tag. Only reads with specified tags are included. NULLs, NAs, and empty strings are not allowed in the atomic vectors.

Of course I can count the reads on the command line using samtools, or count all reads passing the MAPQ filter and then all reads passing the MAPQ filter with the "XA" tag, and then calculate the difference... but is there a better way to do this in R?

rsamtools • 1.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

One possibility is to use scanBam to input the tag field, perhaps with other constraints on the reads, and then count the number of missing values, e.g., after running example(scanBam) (to get fl)

> tableis.na(scanBam(fl, param=ScanBamParam(tag="NM"))[[1]]$tag$NM))

FALSE  TRUE
 3271    36

tells us there are 36 reads with tag NM not defined.

If you'd like to post a feature request to support doing this directly via countBam, consider opening an issue on the github repository https://github.com/Bioconductor/Rsamtools. Hope that helps!

ADD COMMENT

Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6