Entering edit mode
Hi all,
I'm sure it's just me missing something obviuos... but I can't find PTEN in TxDb.Hsapiens.UCSC.hg38.knownGene.
If I do this with two genes in hg19 I get the positions for both genes
> library(AnnotationDbi)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
> AnnotationDbi::select(org.Hs.eg.db::org.Hs.eg.db, keys=c("NF1", "PTEN"), keytype="SYMBOL", columns="ENTREZID")
> all.genes <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
> all.genes[(all.genes$gene_id %in% c("4763", "5728"))]
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
4763 chr17 29421945-29708905 + | 4763
5728 chr10 89623195-89728532 + | 5728
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
However, if I do the same with hg38 I only get the information for NF1 but not for PTEN
> all.genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
> all.genes[(all.genes$gene_id %in% c("4763", "5728"))]
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
4763 chr17 31094927-31381887 + | 4763
-------
seqinfo: 595 sequences (1 circular) from hg38 genome
Am I missing someting? or PTEN was not included into the TxDb?
Thanks a lot!
Bernat
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /software/debian-8/general/R-3.6.1-bioc-3.10/lib/R/lib/libRblas.so
LAPACK: /software/debian-8/general/R-3.6.1-bioc-3.10/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[3] GenomicFeatures_1.38.0 GenomicRanges_1.38.0
[5] GenomeInfoDb_1.22.0 AnnotationDbi_1.48.0
[7] IRanges_2.20.0 S4Vectors_0.24.0
[9] Biobase_2.46.0 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] SummarizedExperiment_1.16.0 progress_1.2.2 tidyselect_0.2.5
[4] purrr_0.3.3 lattice_0.20-38 vctrs_0.2.0
[7] BiocFileCache_1.10.2 rtracklayer_1.46.0 yaml_2.2.0
[10] blob_1.2.0 XML_3.98-1.20 rlang_0.4.1
[13] pillar_1.4.2 glue_1.3.1 DBI_1.0.0
[16] BiocParallel_1.20.0 rappdirs_0.3.1 bit64_0.9-7
[19] dbplyr_1.4.2 matrixStats_0.55.0 GenomeInfoDbData_1.2.2
[22] stringr_1.4.0 zlibbioc_1.32.0 Biostrings_2.54.0
[25] memoise_1.1.0 biomaRt_2.42.0 curl_4.2
[28] Rcpp_1.0.3 openssl_1.4.1 backports_1.1.5
[31] DelayedArray_0.12.0 org.Hs.eg.db_3.10.0 XVector_0.26.0
[34] bit_1.1-14 Rsamtools_2.2.1 hms_0.5.2
[37] askpass_1.1 digest_0.6.22 stringi_1.4.3
[40] dplyr_0.8.3 grid_3.6.1 tools_3.6.1
[43] bitops_1.0-6 magrittr_1.5 RCurl_1.95-4.12
[46] tibble_2.1.3 RSQLite_2.1.2 crayon_1.3.4
[49] pkgconfig_2.0.3 zeallot_0.1.0 Matrix_1.2-17
[52] prettyunits_1.0.2 assertthat_0.2.1 httr_1.4.1
[55] rstudioapi_0.10 R6_2.4.1 GenomicAlignments_1.22.1
[58] compiler_3.6.1
Thanks Lori!
It works perfectly with that. If I understand correctly, this is due to PTEN mapping onto the canonical chromosome and on a gemome patch, isn't it?
In any case, it seems I'm not the only one wondering where my genes have gone! Maybe a note on the existence of this option could be added to the GenomicFeatures vignette? Thanks again!
Thank you for the suggestion we will look into it.
FWIW: I have submitted a pull request that will output a message when this filtering occurs - hopefully it will be helpful. https://github.com/Bioconductor/GenomicFeatures/pull/20
Great! That would be very helpful! thanks!