In this tutorial, "seqlevels<-" can be used to set only part of chromosomes to be active, which is very convenient. However, there may be a minor bug about "seqlevels<-" on TxDb objects.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
seqlevels(txdb) <- c("chr1", "chr2") # only set Chromosome 1 and 2 to be active
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq : logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName : chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels : chr [1:2] "chr1" "chr2"
## user2seqlevels0 : int [1:2] 1 2
seqlevels(txdb) <- c("chr3", "chr4")
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq : logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName : chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels : chr [1:2] "chr3" "chr4"
## user2seqlevels0 : int [1:2] 1 2
transcripts(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # It should not return "chr3"!
## GRanges object with 4073 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr3 [ 11874, 14409] + | 1 uc001aaa.3
## [2] chr3 [ 11874, 14409] + | 2 uc010nxq.1
## [3] chr3 [ 11874, 14409] + | 3 uc010nxr.1
## [4] chr3 [ 69091, 70008] + | 4 uc001aal.1
## [5] chr3 [321084, 321115] + | 5 uc001aaq.2
## ... ... ... ... . ... ...
## [4069] chr3 [249168447, 249168518] + | 4069 uc021pmg.1
## [4070] chr3 [249200442, 249213345] + | 4070 uc001ifg.3
## [4071] chr3 [249200442, 249213345] + | 4071 uc001ifh.3
## [4072] chr3 [249200442, 249213345] + | 4072 uc009xhd.3
## [4073] chr3 [249211537, 249212562] + | 4073 uc021pmh.1
## -------
## seqinfo: 2 sequences from hg19 genome
exons(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # Also wrong
## GRanges object with 13963 ranges and 1 metadata column:
## seqnames ranges strand | exon_id
## <Rle> <IRanges> <Rle> | <integer>
## [1] chr3 [11874, 12227] + | 1
## [2] chr3 [12595, 12721] + | 2
## [3] chr3 [12613, 12721] + | 3
## [4] chr3 [12646, 12697] + | 4
## [5] chr3 [13221, 14409] + | 5
## ... ... ... ... . ...
## [13959] chr3 [249208015, 249208078] + | 13959
## [13960] chr3 [249208638, 249208758] + | 13960
## [13961] chr3 [249210801, 249213345] + | 13961
## [13962] chr3 [249211478, 249213345] + | 13962
## [13963] chr3 [249211537, 249212562] + | 13963
## -------
## seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr2", tx_strand = "+")) # It should not return "chr4"!
## GRanges object with 2642 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr4 [264869, 272481] + | 7968 uc002qwd.2
## [2] chr4 [264869, 273149] + | 7969 uc002qwe.5
## [3] chr4 [264869, 278282] + | 7970 uc002qwf.3
## [4] chr4 [264869, 278282] + | 7971 uc002qwg.3
## [5] chr4 [264869, 278282] + | 7972 uc002qwh.3
## ... ... ... ... . ... ...
## [2638] chr4 [243030844, 243102469] + | 10605 uc010zpd.1
## [2639] chr4 [243030844, 243102469] + | 10606 uc010zpe.1
## [2640] chr4 [243030844, 243102469] + | 10607 uc010zpf.1
## [2641] chr4 [243030844, 243102469] + | 10608 uc010zpg.1
## [2642] chr4 [243064438, 243081022] + | 10609 uc010zph.2
## -------
## seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr3", tx_strand = "+")) # It should return "chr3"!
## GRanges object with 0 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## -------
## seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr4", tx_strand = "+")) # It should return "chr4"!
## GRanges object with 0 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## -------
## seqinfo: 2 sequences from hg19 genome
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [2] GenomicFeatures_1.26.0
## [3] AnnotationDbi_1.36.0
## [4] Biobase_2.34.0
## [5] GenomicRanges_1.26.1
## [6] GenomeInfoDb_1.10.1
## [7] IRanges_2.8.1
## [8] S4Vectors_0.12.1
## [9] BiocGenerics_0.20.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.8 knitr_1.15.1
## [3] XVector_0.14.0 magrittr_1.5
## [5] GenomicAlignments_1.10.0 zlibbioc_1.20.0
## [7] BiocParallel_1.8.1 lattice_0.20-34
## [9] stringr_1.1.0 tools_3.3.2
## [11] grid_3.3.2 SummarizedExperiment_1.4.0
## [13] DBI_0.5-1 htmltools_0.3.5
## [15] yaml_2.1.14 rprojroot_1.1
## [17] digest_0.6.10 Matrix_1.2-7.1
## [19] rtracklayer_1.34.1 bitops_1.0-6
## [21] biomaRt_2.30.0 RCurl_1.95-4.8
## [23] memoise_1.0.0 evaluate_0.10
## [25] RSQLite_1.1 rmarkdown_1.2
## [27] stringi_1.1.2 Rsamtools_1.26.1
## [29] Biostrings_2.42.1 backports_1.0.4
## [31] XML_3.98-1.5
It seems that user2seqlevels0 in txdb is not right when I use "seqlevels<-" the second time. And is that where the problem comes from?
Thanks in advance.
Can Wang
Hi Hervé Pagès,
Thanks for your reply.
svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_4/madman/Rpacks/GenomicFeatures
R CMD INSTALL GenomicFeatures
I intalled it by that code and this bug does not appear anymore.
Thanks!
Can
Hi Hervé Pagès,
Now in GenomicFeatures 1.26.2, the problem has been solved! Thanks!
Can