PTEN missing from TxDb.Hsapiens.UCSC.hg38.knownGene?
1
1
Entering edit mode
bernatgel ▴ 150
@bernatgel-7226
Last seen 7 weeks ago
Spain

Hi all,

I'm sure it's just me missing something obviuos... but I can't find PTEN in TxDb.Hsapiens.UCSC.hg38.knownGene.

If I do this with two genes in hg19 I get the positions for both genes

> library(AnnotationDbi)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)

> AnnotationDbi::select(org.Hs.eg.db::org.Hs.eg.db, keys=c("NF1", "PTEN"), keytype="SYMBOL", columns="ENTREZID")

> all.genes <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
> all.genes[(all.genes$gene_id %in% c("4763", "5728"))]
GRanges object with 2 ranges and 1 metadata column:
       seqnames            ranges strand |     gene_id
          <Rle>         <IRanges>  <Rle> | <character>
  4763    chr17 29421945-29708905      + |        4763
  5728    chr10 89623195-89728532      + |        5728
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome

However, if I do the same with hg38 I only get the information for NF1 but not for PTEN

> all.genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
> all.genes[(all.genes$gene_id %in% c("4763", "5728"))]
GRanges object with 1 range and 1 metadata column:
       seqnames            ranges strand |     gene_id
          <Rle>         <IRanges>  <Rle> | <character>
  4763    chr17 31094927-31381887      + |        4763
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

Am I missing someting? or PTEN was not included into the TxDb?

Thanks a lot!

Bernat

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS:   /software/debian-8/general/R-3.6.1-bioc-3.10/lib/R/lib/libRblas.so
LAPACK: /software/debian-8/general/R-3.6.1-bioc-3.10/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C              LC_TIME=C                 LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8    LC_PAPER=es_ES.UTF-8      LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
 [3] GenomicFeatures_1.38.0                   GenomicRanges_1.38.0                    
 [5] GenomeInfoDb_1.22.0                      AnnotationDbi_1.48.0                    
 [7] IRanges_2.20.0                           S4Vectors_0.24.0                        
 [9] Biobase_2.46.0                           BiocGenerics_0.32.0                     

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.16.0 progress_1.2.2              tidyselect_0.2.5           
 [4] purrr_0.3.3                 lattice_0.20-38             vctrs_0.2.0                
 [7] BiocFileCache_1.10.2        rtracklayer_1.46.0          yaml_2.2.0                 
[10] blob_1.2.0                  XML_3.98-1.20               rlang_0.4.1                
[13] pillar_1.4.2                glue_1.3.1                  DBI_1.0.0                  
[16] BiocParallel_1.20.0         rappdirs_0.3.1              bit64_0.9-7                
[19] dbplyr_1.4.2                matrixStats_0.55.0          GenomeInfoDbData_1.2.2     
[22] stringr_1.4.0               zlibbioc_1.32.0             Biostrings_2.54.0          
[25] memoise_1.1.0               biomaRt_2.42.0              curl_4.2                   
[28] Rcpp_1.0.3                  openssl_1.4.1               backports_1.1.5            
[31] DelayedArray_0.12.0         org.Hs.eg.db_3.10.0         XVector_0.26.0             
[34] bit_1.1-14                  Rsamtools_2.2.1             hms_0.5.2                  
[37] askpass_1.1                 digest_0.6.22               stringi_1.4.3              
[40] dplyr_0.8.3                 grid_3.6.1                  tools_3.6.1                
[43] bitops_1.0-6                magrittr_1.5                RCurl_1.95-4.12            
[46] tibble_2.1.3                RSQLite_2.1.2               crayon_1.3.4               
[49] pkgconfig_2.0.3             zeallot_0.1.0               Matrix_1.2-17              
[52] prettyunits_1.0.2           assertthat_0.2.1            httr_1.4.1                 
[55] rstudioapi_0.10             R6_2.4.1                    GenomicAlignments_1.22.1   
[58] compiler_3.6.1
TxDb annotation TxDb.Hsapiens.UCSC.hg38.knownGene • 1.3k views
ADD COMMENT
4
Entering edit mode
shepherl 4.1k
@lshep
Last seen 4 hours ago
United States

I think there was a similar question answered by @kayla.interdonato here.

If you set single.strand.genes.only=FALSE (note it gives a slightly different structure) you will see PTEN

> all.genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only=FALSE)
> all.genes[names(all.genes) %in% c("4763", "5728")]
GRangesList object of length 2:
$`4763`
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr17 31094927-31381887      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`5728`
GRanges object with 2 ranges and 0 metadata columns:
                  seqnames            ranges strand
                     <Rle>         <IRanges>  <Rle>
  [1]                chr10 87863625-87971930      +
  [2] chr10_KQ090021v1_fix      79262-182163      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

ADD COMMENT
0
Entering edit mode

Thanks Lori!

It works perfectly with that. If I understand correctly, this is due to PTEN mapping onto the canonical chromosome and on a gemome patch, isn't it?

In any case, it seems I'm not the only one wondering where my genes have gone! Maybe a note on the existence of this option could be added to the GenomicFeatures vignette? Thanks again!

ADD REPLY
1
Entering edit mode

Thank you for the suggestion we will look into it.

ADD REPLY
0
Entering edit mode

FWIW: I have submitted a pull request that will output a message when this filtering occurs - hopefully it will be helpful. https://github.com/Bioconductor/GenomicFeatures/pull/20

ADD REPLY
0
Entering edit mode

Great! That would be very helpful! thanks!

ADD REPLY

Login before adding your answer.

Traffic: 392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6