I can read a BAM
file including selected SAM tags using the import
command from the rtracklayer
package and the ScanBamParam
command from the Rsamtools
package. The SAM tags are added to the resulting GAlignments
object as metadata columns. I cannot figure out how to access the metadata columns in the GAlignments
object. I only manage to access them after conversion into a GRanges
object (see code example below). Is there away to access metadata columns directly in a GAlignments
object?
library(rtracklayer)
library(Rsamtools)
fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
tmp <- import(con = fl, param = ScanBamParam(tag = "NM"))
tmp
# GAlignments object with 3271 alignments and 1 metadata column:
# seqnames strand cigar qwidth start end width njunc | NM
# <Rle> <Rle> <character> <integer> <integer> <integer> <integer> <integer> | <integer>
# [1] seq1 + 36M 36 1 36 36 0 | 0
# [2] seq1 + 35M 35 3 37 35 0 | 0
# [3] seq1 + 35M 35 5 39 35 0 | 0
# [4] seq1 + 36M 36 6 41 36 0 | 5
# [5] seq1 + 35M 35 9 43 35 0 | 0
# ... ... ... ... ... ... ... ... ... . ...
# [3267] seq2 + 35M 35 1524 1558 35 0 | 3
# [3268] seq2 + 35M 35 1524 1558 35 0 | 3
# [3269] seq2 - 35M 35 1528 1562 35 0 | 1
# [3270] seq2 - 35M 35 1532 1566 35 0 | 2
# [3271] seq2 - 35M 35 1533 1567 35 0 | 2
# -------
# seqinfo: 2 sequences from an unspecified genome
head(tmp$NM)
# Error in getListElement(x, i, ...) :
# GAlignments objects don't support [[, $, as.list(), lapply(), or unlist()
head(as(object = tmp, Class = "GRanges")$NM)
# [1] 0 0 0 5 0 1
sessionInfo()
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04.4 LTS
#
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] parallel stats4 stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] Rsamtools_2.0.3 Biostrings_2.52.0 XVector_0.24.0 rtracklayer_1.44.4 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3
# [8] S4Vectors_0.22.1 BiocGenerics_0.30.0
#
# loaded via a namespace (and not attached):
# [1] matrixStats_0.55.0 lattice_0.20-38 XML_3.99-0.3 GenomicAlignments_1.20.1 bitops_1.0-6
# [6] grid_3.6.3 zlibbioc_1.30.0 Matrix_1.2-18 BiocParallel_1.18.1 tools_3.6.3
# [11] Biobase_2.44.0 RCurl_1.98-1.1 DelayedArray_0.10.0 compiler_3.6.3 SummarizedExperiment_1.14.1
# [16] GenomeInfoDbData_1.2.1
Thanks for the clarification, Hervé!