How to access metadata of GAlignments objects
1
0
Entering edit mode
Robert ▴ 10
@robert-21245
Last seen 2.9 years ago
United States

I can read a BAM file including selected SAM tags using the import command from the rtracklayer package and the ScanBamParam command from the Rsamtools package. The SAM tags are added to the resulting GAlignments object as metadata columns. I cannot figure out how to access the metadata columns in the GAlignments object. I only manage to access them after conversion into a GRanges object (see code example below). Is there away to access metadata columns directly in a GAlignments object?

library(rtracklayer)
library(Rsamtools)

fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
tmp <- import(con = fl, param = ScanBamParam(tag = "NM"))

tmp
# GAlignments object with 3271 alignments and 1 metadata column:
#   seqnames strand       cigar    qwidth     start       end     width     njunc |        NM
# <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> | <integer>
#   [1]     seq1      +         36M        36         1        36        36         0 |         0
# [2]     seq1      +         35M        35         3        37        35         0 |         0
# [3]     seq1      +         35M        35         5        39        35         0 |         0
# [4]     seq1      +         36M        36         6        41        36         0 |         5
# [5]     seq1      +         35M        35         9        43        35         0 |         0
# ...      ...    ...         ...       ...       ...       ...       ...       ... .       ...
# [3267]     seq2      +         35M        35      1524      1558        35         0 |         3
# [3268]     seq2      +         35M        35      1524      1558        35         0 |         3
# [3269]     seq2      -         35M        35      1528      1562        35         0 |         1
# [3270]     seq2      -         35M        35      1532      1566        35         0 |         2
# [3271]     seq2      -         35M        35      1533      1567        35         0 |         2
# -------
#   seqinfo: 2 sequences from an unspecified genome

head(tmp$NM)
# Error in getListElement(x, i, ...) :
#   GAlignments objects don't support [[, $, as.list(), lapply(), or unlist()

head(as(object = tmp, Class = "GRanges")$NM)
# [1] 0 0 0 5 0 1

sessionInfo()
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04.4 LTS
# 
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
# [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
#   [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] Rsamtools_2.0.3      Biostrings_2.52.0    XVector_0.24.0       rtracklayer_1.44.4   GenomicRanges_1.36.1 GenomeInfoDb_1.20.0  IRanges_2.18.3      
# [8] S4Vectors_0.22.1     BiocGenerics_0.30.0 
# 
# loaded via a namespace (and not attached):
#   [1] matrixStats_0.55.0          lattice_0.20-38             XML_3.99-0.3                GenomicAlignments_1.20.1    bitops_1.0-6               
# [6] grid_3.6.3                  zlibbioc_1.30.0             Matrix_1.2-18               BiocParallel_1.18.1         tools_3.6.3                
# [11] Biobase_2.44.0              RCurl_1.98-1.1              DelayedArray_0.10.0         compiler_3.6.3              SummarizedExperiment_1.14.1
# [16] GenomeInfoDbData_1.2.1
rtracklayer Rsamtools GAlignments GRanges metadata • 1.5k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States

That is because the proper way to access the metadata columns is with mcols(x) or mcols(x)$NM. This is the only form that is guaranteed to work across all Vector derivatives. The shortcut x$NM is only supported by GRanges objects and derivatives, and for convenience. Note however that its semantic conflicts with the semantic of $ on other Vector derivatives that are list-like objects like GRangesList objects. For these objects $ behaves like on any list-like object i.e. it accesses a particular list element.

ADD COMMENT
0
Entering edit mode

Thanks for the clarification, Hervé!

ADD REPLY

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6