Entering edit mode
Hi,
I have created this GRanges object from data frame. I want to select for seqnames Chr1- chr22 and chrX.
> x2
GRanges object with 232184 ranges and 1 metadata column:
seqnames ranges strand | gene
<Rle> <IRanges> <Rle> | <character>
[1] chr1 11868-14409 + | DDX11L1
[2] chr1 12009-13670 + | DDX11L1
[3] chr1 14403-29570 - | WASH7P
[4] chr1 17368-17436 - | MIR6859-1
[5] chr1 29553-31097 + | MIR1302-2HG
... ... ... ... . ...
[232180] chr22_KI270734v1_ran.. 59710-60316 + | AC007325.3
[232181] chr22_KI270734v1_ran.. 72410-74814 + | AC007325.1
[232182] chr22_KI270734v1_ran.. 131493-137392 + | AC007325.4
[232183] chr22_KI270734v1_ran.. 138081-161750 - | AC007325.2
[232184] chr22_KI270734v1_ran.. 138081-161852 - | AC007325.2
> seqnames(x2)
factor-Rle of length 232184 with 47 runs
Lengths: 20376 16951 ... 5
Values : chr1 chr2 ... chr22_KI270734v1_random
Levels(47): chr1 chr2 chr3 chr4 ... chrUn_KI270442v1 chrUn_KI270744v1 chrUn_KI270750v1
I created a vector for all chromosomes(1-22 and X).
>Total_chr <- c(paste("chr", 1:22, sep=''), 'chrX')
>Total_chr
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11"
[12] "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr20" "chr21" "chr22"
[23] "chrX"
When I am subsetting x2 using Total_chr, it gives only for chr1-chr22 not for chrX. However when I check in my GRanges object chrX is present.
> x3 <- x2[seqnames(x2) == Total_chr]
GRanges object with 10048 ranges and 1 metadata column:
seqnames ranges strand | gene
<Rle> <IRanges> <Rle> | <character>
[1] chr1 11868-14409 + | DDX11L1
[2] chr1 139789-140339 - | AL627309.2
[3] chr1 365388-366151 - | AL732372.2
[4] chr1 498280-499175 - | AL732372.2
[5] chr1 632324-632413 - | AC114498.2
... ... ... ... . ...
[10044] chr22 50546414-50547652 + | KLHDC7B
[10045] chr22 50578962-50582824 - | CHKB
[10046] chr22 50625017-50628170 - | ARSA
[10047] chr22 50744143-50744910 + | ACR
[10048] chr22 50783849-50799441 + | RPL23AP82
> seqnames(x3)
factor-Rle of length 10048 with 23 runs
Lengths: 886 737 628 414 480 461 478 ... 596 195 600 237 129 218
Values : chr1 chr2 chr3 chr4 chr5 chr6 chr7 ... chr17 chr18 chr19 chr20 chr21 chr22
Levels(47): chr1 chr2 chr3 chr4 ... chrUn_KI270442v1 chrUn_KI270744v1 chrUn_KI270750v1
However when I check in my GRanges object 'chrX' is present.
x2[seqnames(x2) == 'chrX']
GRanges object with 7169 ranges and 1 metadata column:
seqnames ranges strand | gene
<Rle> <IRanges> <Rle> | <character>
[1] chrX 253742-255091 + | AL954722.1
[2] chrX 276321-303353 + | PLCXD1
[3] chrX 276323-291537 + | PLCXD1
[4] chrX 276352-291629 + | PLCXD1
[5] chrX 281054-288869 + | PLCXD1
... ... ... ... . ...
[7165] chrX 156022785-156023531 + | WASH6P
[7166] chrX 156023366-156025666 + | WASH6P
[7167] chrX 156023823-156025554 + | WASH6P
[7168] chrX 156024070-156025554 + | WASH6P
[7169] chrX 156025663-156027877 - | DDX11L16
-------
seqinfo: 47 sequences from an unspecified genome; no seqlengths
>sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.3.5 BiocParallel_1.24.1
[3] MEDIPSData_1.26.0 BSgenome.Hsapiens.UCSC.hg38.masked_1.3.993
[5] BSgenome.Hsapiens.UCSC.hg19_1.4.3 org.Hs.eg.db_3.12.0
[7] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0 GenomicFeatures_1.42.3
[9] AnnotationDbi_1.52.0 Biobase_2.50.0
[11] annotatr_1.16.0 BSgenome.Hsapiens.UCSC.hg38_1.4.3
[13] BSgenome_1.58.0 rtracklayer_1.50.0
[15] Biostrings_2.58.0 XVector_0.30.0
[17] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[19] IRanges_2.24.1 S4Vectors_0.28.1
[21] BiocGenerics_0.36.1 qsea_1.16.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.59.0 bit64_4.0.5
[4] progress_1.2.2 httr_1.4.2 tools_4.0.5
[7] utf8_1.2.1 R6_2.5.0 DBI_1.1.1
[10] colorspace_2.0-2 withr_2.4.2 tidyselect_1.1.1
[13] prettyunits_1.1.1 bit_4.0.4 curl_4.3.2
[16] compiler_4.0.5 xml2_1.3.2 DelayedArray_0.16.3
[19] scales_1.1.1 readr_1.4.0 askpass_1.1
[22] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.27
[25] Rsamtools_2.6.0 pkgconfig_2.0.3 htmltools_0.5.1.1
[28] MatrixGenerics_1.2.1 highr_0.9 regioneR_1.22.0
[31] dbplyr_2.1.1 fastmap_1.1.0 limma_3.46.0
[34] rlang_0.4.11 rstudioapi_0.13 RSQLite_2.2.7
[37] shiny_1.6.0 generics_0.1.0 zoo_1.8-9
[40] gtools_3.9.2 dplyr_1.0.7 RCurl_1.98-1.3
[43] magrittr_2.0.1 GenomeInfoDbData_1.2.4 Matrix_1.3-4
[46] Rcpp_1.0.6 munsell_0.5.0 fansi_0.5.0
[49] lifecycle_1.0.0 stringi_1.6.2 yaml_2.2.1
[52] SummarizedExperiment_1.20.0 zlibbioc_1.36.0 plyr_1.8.6
[55] HMMcopy_1.32.0 BiocFileCache_1.14.0 AnnotationHub_2.22.1
[58] grid_4.0.5 blob_1.2.1 promises_1.2.0.1
[61] crayon_1.4.1 lattice_0.20-44 hms_1.1.0
[64] knitr_1.33 pillar_1.6.1 reshape2_1.4.4
[67] biomaRt_2.46.3 XML_3.99-0.6 glue_1.4.2
[70] BiocVersion_3.12.0 data.table_1.14.0 BiocManager_1.30.16
[73] vctrs_0.3.8 httpuv_1.6.1 gtable_0.3.0
[76] openssl_1.4.4 purrr_0.3.4 assertthat_0.2.1
[79] cachem_1.0.5 xfun_0.24 mime_0.11
[82] xtable_1.8-4 later_1.2.0 tibble_3.1.2
[85] GenomicAlignments_1.26.0 memoise_2.0.0 ellipsis_0.3.2
[88] interactiveDisplayBase_1.28.0
Why is it so ? What I am missing while subsetting ?
Thanks
Rakesh
Also, note that this
doesn't work the way you might think. Using equality on a
vector
isn't really a thing. Instead you should use%in%
Hi James,
Thanks !! I will stick to use the existing functionality rather than introducing my own.
Just for my understanding, I tried using standardChromosomes() on my GRanges object but still it shows only chr1 - chr22 but no chrX
I am wondering why it didn't work even using the standard functionality.
Thanks !!
Rakesh
I would bet that the order of your
GRanges
object has chrX somewhere in the middle. For example, you have this:Which indicates you have 23 chromosomes, some of which are not shown (they are in the ellipsis). And if you have chr1-22, but a total of 23, what's the 23rd one? What do you get from
23rd one is 'chrX'.
"It's really a insightful discussion". Thanks James !!
Best Rakesh