How to access custom SAM tags (Rsamtools?)
2
0
Entering edit mode
Kemal Akman ▴ 20
@kemal-akman-5293
Last seen 10.2 years ago
Hello, I'm interested in accessing extended SAM tags in aligned short read sequence files using a R/Bioconductor library. Rsamtools seems potentially suited for this purpose, but I couldn't find the arguments to return custom SAM tags, such as "XX:Z", but there doesn't seem to be a corresponding "what" value in ScanBamParam()? I'd be especially interested in such a feature to parse methylation strings from Bismark, as well as custom SAM tags from other tools. Any suggestions on Rsamtools or alternative methods to achieve this in Bioconductor would be much appreciated. Example SAM data: $ head -2 sample.sam SRR306424.2547_PRESLEY:4:4:62:558_length=76 16 chr22 30675421 255 36M * 0 0 CACACACATCCACATAACACCATAACCAACCCCCGA ;,?A9:>B at B?@BBAC at C/BBC at CBCCC8BCBACBB NM:i:4 XX:Z:15GG6G11G XM:Z:...............hh......h..........Zx XR:Z:CT XG:Z:GA SRR306424.5227_PRESLEY:4:4:113:1768_length=76 0 chr5 123101409 255 75M * 0 0 TGATTTTTATATAAGGTGTAAGTAAGAGGTTTAGTTTTGATTTTTTGTATATGGATAATTAGTTTTTTTA GTATT B>?@BBABBCBCCBCB>B at B@B?B at B@BB>BB at B?BBBB at BCBBABBAABABB@@B??ABAA>BABBBA=><@A= NM:i:13 XX:Z:22C8C12C4C8CC4C1CCC2C1CC XM:Z:......................h........x............x....h........hx....h .hhx..h.hh XR:Z:CT XG:Z:CT Best regards, Kemal Akman
Rsamtools Rsamtools • 2.8k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 11/29/2012 06:23 AM, Kemal Akman wrote: > Hello, > > I'm interested in accessing extended SAM tags in aligned short read > sequence files using a R/Bioconductor library. Rsamtools seems > potentially suited for this purpose, but I couldn't find the arguments > to return custom SAM tags, such as "XX:Z", but there doesn't seem to > be a corresponding "what" value in ScanBamParam()? Hi, There is a 'tag' argument to ScanBamParam; it's illustrated on the help page ?ScanBamParam and in example(ScanBamParam) which leads in part to... ScnBmP> ## tags; NM: edit distance; H1: 1-difference hits ScnBmP> p4 <- ScanBamParam(tag=c("NM", "H1"), what="flag") ScnBmP> bam4 <- scanBam(fl, param=p4) ScnBmP> str(bam4[[1]][["tag"]]) List of 2 $ NM: int [1:3307] 0 0 0 5 0 1 0 1 0 1 ... $ H1: int [1:3307] 0 0 0 0 0 1 0 1 0 0 ... It's often convenient to readBamGappedAlignments with extra fields, so head(readBamGappedAlignments(fl, param=p4)) gives > head(readBamGappedAlignments(fl, param=p4)) GappedAlignments with 6 alignments and 3 metadata columns: seqnames strand cigar qwidth start end width <rle> <rle> <character> <integer> <integer> <integer> <integer> [1] seq1 + 36M 36 1 36 36 [2] seq1 + 35M 35 3 37 35 [3] seq1 + 35M 35 5 39 35 [4] seq1 + 36M 36 6 41 36 [5] seq1 + 35M 35 9 43 35 [6] seq1 + 35M 35 13 47 35 ngap | flag NM H1 <integer> | <integer> <integer> <integer> [1] 0 | 73 0 0 [2] 0 | 73 0 0 [3] 0 | 137 0 0 [4] 0 | 137 5 0 [5] 0 | 137 0 0 [6] 0 | 73 1 1 --- seqlengths: seq1 seq2 1575 1584 Martin > > I'd be especially interested in such a feature to parse methylation > strings from Bismark, as well as custom SAM tags from other tools. > > Any suggestions on Rsamtools or alternative methods to achieve this in > Bioconductor would be much appreciated. > > Example SAM data: > $ head -2 sample.sam > SRR306424.2547_PRESLEY:4:4:62:558_length=76 16 chr22 > 30675421 255 36M * 0 0 > CACACACATCCACATAACACCATAACCAACCCCCGA > ;,?A9:>B at B?@BBAC at C/BBC at CBCCC8BCBACBB NM:i:4 XX:Z:15GG6G11G > XM:Z:...............hh......h..........Zx XR:Z:CT XG:Z:GA > SRR306424.5227_PRESLEY:4:4:113:1768_length=76 0 chr5 > 123101409 255 75M * 0 0 > TGATTTTTATATAAGGTGTAAGTAAGAGGTTTAGTTTTGATTTTTTGTATATGGATAATTAGTTTTTT TAGTATT > B>?@BBABBCBCCBCB>B at B@B?B at B@BB>BB at B?BBBB at BCBBABBAABABB@@B??ABAA>BABBBA=><@A= > NM:i:13 XX:Z:22C8C12C4C8CC4C1CCC2C1CC > XM:Z:......................h........x............x....h........hx... .h.hhx..h.hh > XR:Z:CT XG:Z:CT > > > Best regards, > > Kemal Akman > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 10.2 years ago
Hej! You want to use the tag argument of the ScanBamParam object. Here is how I retrieve the BWA XA tags: bam<- scanBam("my.bam", index="my.bam", param=ScanBamParam( flag=scanBamFlag(isUnmappedQuery=FALSE), what=c("rname","pos","qwidth"), tag="XA"))[[1]] Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Nov 29, 2012, at 3:23 PM, Kemal Akman wrote: > Hello, > > I'm interested in accessing extended SAM tags in aligned short read > sequence files using a R/Bioconductor library. Rsamtools seems > potentially suited for this purpose, but I couldn't find the arguments > to return custom SAM tags, such as "XX:Z", but there doesn't seem to > be a corresponding "what" value in ScanBamParam()? > > I'd be especially interested in such a feature to parse methylation > strings from Bismark, as well as custom SAM tags from other tools. > > Any suggestions on Rsamtools or alternative methods to achieve this in > Bioconductor would be much appreciated. > > Example SAM data: > $ head -2 sample.sam > SRR306424.2547_PRESLEY:4:4:62:558_length=76 16 chr22 > 30675421 255 36M * 0 0 > CACACACATCCACATAACACCATAACCAACCCCCGA > ;,?A9:>B at B?@BBAC at C/BBC at CBCCC8BCBACBB NM:i:4 XX:Z:15GG6G11G > XM:Z:...............hh......h..........Zx XR:Z:CT XG:Z:GA > SRR306424.5227_PRESLEY:4:4:113:1768_length=76 0 chr5 > 123101409 255 75M * 0 0 > TGATTTTTATATAAGGTGTAAGTAAGAGGTTTAGTTTTGATTTTTTGTATATGGATAATTAGTTTTTT TAGTATT > B>?@BBABBCBCCBCB>B at B@B?B at B@BB>BB at B?BBBB at BCBBABBAABABB@@B??ABAA>BABBBA=><@A= > NM:i:13 XX:Z:22C8C12C4C8CC4C1CCC2C1CC > XM:Z:......................h........x............x....h........hx... .h.hhx..h.hh > XR:Z:CT XG:Z:CT > > > Best regards, > > Kemal Akman > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6