How does one modify a single value of a field in a seqinfo object?
2
0
Entering edit mode
Chris Seidel ▴ 80
@chris-seidel-5840
Last seen 3.6 years ago
United States

I have a BSgenome object, and I'm trying to change the Mitochondrial chromosome to be circular, as currently it is not:

> seqinfo(astMex2)
Seqinfo object with 2416 sequences from astMex_2.0 genome:
  seqnames       seqlengths isCircular     genome
  1                26953843      FALSE astMex_2.0
  2                34292151      FALSE astMex_2.0
  3                74127438      FALSE astMex_2.0
  4                25652880      FALSE astMex_2.0
  5                40584741      FALSE astMex_2.0
  ...                   ...        ...        ...
  APWO02002412.1     135710      FALSE astMex_2.0
  APWO02002415.1      25029      FALSE astMex_2.0
  MT                  16682      FALSE astMex_2.0 <-- oops!

I feel like I've tried a million different things of the sort:

isCircular(astMex2)[seqlevels(astMex2) == "MT"] <- TRUE
Error in `seqinfo<-`(`*tmp*`, value = `*vtmp*`) : 
  'new2old' must be specified when replacing the 'seqinfo' of a BSgenome object

I've tried reading the documentation on what 'new2old' means, but am unable to understand how to use it. It is absent from all the examples in the docs.

How does one modify a single value of a field in a seqinfo object? I can't seem to figure out how to use isCircular as a setter.

GenomicRanges GenomeInfoDb BSgenome isCircular • 1.8k views
ADD COMMENT
1
Entering edit mode
nick.eagles ▴ 100
@nickeagles-24264
Last seen 4 months ago
United States

Hello,

Although there may be a "proper" way to do this through Seqinfo methods or functions, I've found editing SeqInfo objects to be generally counterintuitive and unpleasant. You can circumvent these types of errors by reconstructing the object entirely (again- not the most elegant or efficient solution, but it works). In your case, this could look like:

# Copy and modify the 'isCircular' attribute of the seqinfo
temp_circular = isCircular(astMex2)
temp_circular['MT'] = TRUE

# Reconstruct the Seqinfo object with the same name
astMex2 = Seqinfo(seqnames(astMex2), seqlengths(astMex2), temp_circular, genome(astMex2))

# You can check that your change worked
isCircular(astMex2)['MT']

I tested this general approach successfully with the BSgenome package BSgenome.Hsapiens.UCSC.hg38 and the corresponding Hsapiens object. I've attached my session info at the bottom of this message.

Best,

-Nick

P.S. I'm actively learning to help others.

library('sessioninfo')
session_info()
#─ Session info ───────────────────────────────────────────────────────────────
# setting  value
# version  R version 4.0.2 Patched (2020-06-24 r78746)
# os       CentOS Linux 7 (Core)
# system   x86_64, linux-gnu
# ui       X11
# language (EN)
# collate  en_US.UTF-8
# ctype    en_US.UTF-8
# tz       US/Eastern
# date     2020-11-11
#
#─ Packages ───────────────────────────────────────────────────────────────────
# package                     * version  date       lib source
# assertthat                    0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
# Biobase                       2.48.0   2020-04-27 [2] Bioconductor
# BiocGenerics                * 0.34.0   2020-04-27 [2] Bioconductor
# BiocParallel                  1.22.0   2020-04-27 [2] Bioconductor
# Biostrings                  * 2.56.0   2020-04-27 [2] Bioconductor
# bitops                        1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
# BSgenome                    * 1.56.0   2020-04-27 [2] Bioconductor
# BSgenome.Hsapiens.UCSC.hg38 * 1.4.3    2020-07-09 [1] Bioconductor
# cli                           2.1.0    2020-10-12 [2] CRAN (R 4.0.2)
# crayon                        1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
# DelayedArray                  0.14.1   2020-07-14 [2] Bioconductor
# fansi                         0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
# GenomeInfoDb                * 1.24.2   2020-06-15 [2] Bioconductor
# GenomeInfoDbData              1.2.3    2020-05-18 [2] Bioconductor
# GenomicAlignments             1.24.0   2020-04-27 [2] Bioconductor
# GenomicRanges               * 1.40.0   2020-04-27 [2] Bioconductor
# glue                          1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
# IRanges                     * 2.22.2   2020-05-21 [2] Bioconductor
# lattice                       0.20-41  2020-04-02 [3] CRAN (R 4.0.2)
# Matrix                        1.2-18   2019-11-27 [3] CRAN (R 4.0.2)
# matrixStats                   0.57.0   2020-09-25 [2] CRAN (R 4.0.2)
# RCurl                         1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
# Rsamtools                     2.4.0    2020-04-27 [2] Bioconductor
# rtracklayer                 * 1.48.0   2020-04-27 [2] Bioconductor
# S4Vectors                   * 0.26.1   2020-05-16 [2] Bioconductor
# sessioninfo                 * 1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
# SummarizedExperiment          1.18.2   2020-07-09 [2] Bioconductor
# withr                         2.3.0    2020-09-22 [2] CRAN (R 4.0.2)
# XML                           3.99-0.5 2020-07-23 [2] CRAN (R 4.0.2)
# XVector                     * 0.28.0   2020-04-27 [2] Bioconductor
# zlibbioc                      1.34.0   2020-04-27 [2] Bioconductor
#
#[1] /users/neagles/R/4.0
#[2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
#[3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States

Hi,

The seqlengths and circularity flags of a BSgenome object cannot be changed by the user. This is by design. The rationale being that if they are wrong and need to be changed, then the BSgenome object should be considered broken and should be fixed upstream.

On other objects with a seqinfo component, say GRanges objects, you would change the circularity flag of a given sequence with something like isCircular(gr)["chr3"] <- new_value. For example:

library(GenomicRanges)
example(GRanges)
seqinfo(gr)
# Seqinfo object with 3 sequences from mock1 genome:
#   seqnames seqlengths isCircular genome
#   chr1           1000         NA  mock1
#   chr2           2000         NA  mock1
#   chr3           1500         NA  mock1
isCircular(gr)
# chr1 chr2 chr3 
#   NA   NA   NA 
isCircular(gr)["chr3"] <- TRUE
isCircular(gr)
# chr1 chr2 chr3 
#   NA   NA TRUE

However, trying to do this on a BSgenome object fails with the following error:

library(BSgenome.Celegans.UCSC.ce11)
genome <- BSgenome.Celegans.UCSC.ce11
isCircular(genome)
#   chrI  chrII chrIII  chrIV   chrV   chrX   chrM 
#  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE   TRUE 
isCircular(genome)["chrX"] <- NA
# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) : 
#   seqlengths() and isCircular() of the supplied 'seqinfo' must be
#   identical to seqlengths() and isCircular() of the current 'seqinfo'
#   when replacing the 'seqinfo' of a BSgenome object

Note that the error message has changed in Bioconductor 3.12 (it looks like you're using an older version of Bioconductor, please upgrade). It's a little bit less cryptic now but you could argue that maybe it could be even clearer. FWIW the reason why we get this error message is because isCircular(genome)["chrX"] <- NA is doing the following behind the scene:

si <- seqinfo(genome)
isCircular(si)["chrX"] <- NA
si
# Seqinfo object with 7 sequences (1 circular) from ce11 genome:
#   seqnames seqlengths isCircular genome
#   chrI       15072434      FALSE   ce11
#   chrII      15279421      FALSE   ce11
#   chrIII     13783801      FALSE   ce11
#   chrIV      17493829      FALSE   ce11
#   chrV       20924180      FALSE   ce11
#   chrX       17718942         NA   ce11
#   chrM          13794       TRUE   ce11
seqinfo(genome) <- si
# Error in .check_new2old_and_new_seqinfo(new2old, value, seqinfo(x), context) : 
#   seqlengths() and isCircular() of the supplied 'seqinfo' must be
#   identical to seqlengths() and isCircular() of the current 'seqinfo'
#   when replacing the 'seqinfo' of a BSgenome object

As you can see, everything works fine until we try to set the modified Seqinfo object back on the BSgenome object.

So here is what you can do to move forward:

  1. Report the inaccurate circularity flag of the MT sequence of the astMex2 BSgenome upstream. This needs to be fixed.
  2. In the meantime, use the following dirty hack to fix the flag:
     si <- seqinfo(astMex2)
     isCircular(si)["MT"] <- TRUE
     astMex2@seqinfo <- si
    
    Note that we're using direct slot access here to force the modified Seqinfo object into the BSgenome object. Using direct slot access is dangerous and almost always a bad idea so please, consider this a dirty hack and an exception.
  3. Upgrade your installation to Bioconductor 3.12 as soon as you can. Older versions are no longer supported.

Hope this helps,

H.

> sessionInfo()
R version 4.0.2 Patched (2020-06-24 r78746)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-4.0.r78746/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78746/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Celegans.UCSC.ce11_1.4.2 BSgenome_1.58.0                  
 [3] rtracklayer_1.50.0                Biostrings_2.58.0                
 [5] XVector_0.30.0                    GenomicRanges_1.42.0             
 [7] GenomeInfoDb_1.26.0               IRanges_2.24.0                   
 [9] S4Vectors_0.29.2                  BiocGenerics_0.36.0              

loaded via a namespace (and not attached):
 [1] zlibbioc_1.36.0             GenomicAlignments_1.26.0   
 [3] BiocParallel_1.24.1         lattice_0.20-41            
 [5] tools_4.0.2                 SummarizedExperiment_1.20.0
 [7] grid_4.0.2                  Biobase_2.50.0             
 [9] matrixStats_0.57.0          crayon_1.3.4               
[11] Matrix_1.2-18               GenomeInfoDbData_1.2.4     
[13] bitops_1.0-6                RCurl_1.98-1.2             
[15] DelayedArray_0.16.0         compiler_4.0.2             
[17] MatrixGenerics_1.2.0        Rsamtools_2.6.0            
[19] XML_3.99-0.5
ADD COMMENT

Login before adding your answer.

Traffic: 823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6