how to use Rsamtools to parse extra fields from STAR alignment
1
1
Entering edit mode
Jiping Wang ▴ 90
@jiping-wang-6687
Last seen 2.3 years ago
United States

Hi, I am trying to use Rsamtools to parse bam file from STAR aligner. STAR outputs a few extra columns than standard SAM format. But Rsamtools seem to follow the standard SAM format. For example:

 > library(Rsamtools)
 > bam_file1="~/yeast_77_78/yeast_77_78.bam"
 > bf = BamFile(bam_file, asMates = TRUE, qnameSuffixStart = ".")
 > param = ScanBamParam(flag=scanBamFlag(isPaired=TRUE),
                           what=scanBamWhat(),which=which)
 > bam <- scanBam(bam_file, param=param)

 > names(bam$`chrI:1-230218`)
 [1] "qname"  "flag"   "rname"  "strand" "pos"    "qwidth" "mapq"   "cigar"  "mrnm"   "mpos"  
[11] "isize"  "seq"    "qual"  

sessionInfo( )

However STAR alignment outputs three extra columns, which are useful. The NH:i:3 column indicates number of hits and HI:i:2 indicates the index of current hits etc. But Rsamtools does not recognize these extra columns. These columns are sometimes useful. Is there any way to include these columns by specifying ScanBamParam? I can certainly export bam to sam file, but it takes disk space and it would be ideal such columns can be directly specified or included when executing scanBam command. Thanks for help.

NH:i:3  HI:i:2  AS:i:46 nM:i:0
Rsamtools • 1.2k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

The 'extra columns' are called 'tags' in the SAM specification. Following the documentation ?ScanBamParam you should add tag = "NH". See also the package GenomicAlignments (e.g., readGAlignments()) for a more convenient representation of the BAM file.

ADD COMMENT
1
Entering edit mode

Thanks so much! That's exactly the solution I am looking for.

ADD REPLY

Login before adding your answer.

Traffic: 738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6