Question

Successor of genesBy Defunct Function

0

Entering edit mode

Dario Strbenac ★ 1.6k

@dario-strbenac-5916

Last seen 8 days ago

Australia

In the vignette, I see:

To extract these from an Ensembl based annotation package, the exonsBy, genesBy and transcriptsBy methods can be used in an analogous way as in TxDb packages generated by the GenomicFeatures package.

genesBy doesn't seem to exist any longer. What is its successor? The vignette could be updated to reflect this change in A.P.I.

ensembldb • 1.1k views

ADD COMMENT • link updated 2.2 years ago by Hervé Pagès 16k • written 2.2 years ago by Dario Strbenac ★ 1.6k

score 2 · Accepted Answer · 2023-01-17

I believe that is a typographic error. There isn't (nor IIRC has there ever been) a genesBy function in either ensembldb nor AnnotationDbi. It wouldn't make much sense to have one anyway, because a gene by definition (at least in Bioconductor) contains all the transcripts and all the exons for that gene. In other words, genesBy(EnsDb, "exon") should return a GRangesList with a single gene per exon, which is trivial and boring.

But I suppose there are some instances where a particular exon is shared by a bunch of very closely related genes, and it's trivial to roll yer own.

## First get an EnsDb
> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("ensdb","homo sapiens"))
AnnotationHub with 23 records
# snapshotDate(): 2022-10-31
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH53211"]]' 

             title                             
  AH53211  | Ensembl 87 EnsDb for Homo Sapiens 
  AH53715  | Ensembl 88 EnsDb for Homo Sapiens 
  AH56681  | Ensembl 89 EnsDb for Homo Sapiens 
  AH57757  | Ensembl 90 EnsDb for Homo Sapiens 
  AH60773  | Ensembl 91 EnsDb for Homo Sapiens 
  ...        ...                               
  AH95744  | Ensembl 104 EnsDb for Homo sapiens
  AH98047  | Ensembl 105 EnsDb for Homo sapiens
  AH100643 | Ensembl 106 EnsDb for Homo sapiens
  AH104864 | Ensembl 107 EnsDb for Homo sapiens
  AH109336 | Ensembl 108 EnsDb for Homo sapiens
> ensdb <- hub[["AH109336"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache

## get exons by gene
> exby <- exonsBy(ensdb, "gene")
## inspect
> exby
GRangesList object of length 70616:
$ENSG00000000003
GRanges object with 20 ranges and 1 metadata column:
       seqnames              ranges strand |         exon_id
          <Rle>           <IRanges>  <Rle> |     <character>
   [1]        X 100639945-100639991      - | ENSE00001828996
   [2]        X 100636793-100637104      - | ENSE00001863395
   [3]        X 100636608-100636806      - | ENSE00001855382
   [4]        X 100636191-100636689      - | ENSE00001886883
   [5]        X 100635558-100635746      - | ENSE00003662440
   ...      ...                 ...    ... .             ...
  [16]        X 100632541-100632568      - | ENSE00001849132
  [17]        X 100632063-100632068      - | ENSE00003731560
  [18]        X 100630759-100630866      - | ENSE00000868868
  [19]        X 100627108-100629986      - | ENSE00001459322
  [20]        X 100627109-100629986      - | ENSE00003730948
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

...
<70615 more elements>
## convert to GRanges, add a genes mcol and split
> exby2 <- unlist(exby)
> exby2
GRanges object with 888642 ranges and 1 metadata column:
                  seqnames              ranges strand |         exon_id
                     <Rle>           <IRanges>  <Rle> |     <character>
  ENSG00000000003        X 100639945-100639991      - | ENSE00001828996
  ENSG00000000003        X 100636793-100637104      - | ENSE00001863395
  ENSG00000000003        X 100636608-100636806      - | ENSE00001855382
  ENSG00000000003        X 100636191-100636689      - | ENSE00001886883
  ENSG00000000003        X 100635558-100635746      - | ENSE00003662440
              ...      ...                 ...    ... .             ...
          LRG_999       19   42293592-42293836      + |    LRG_999t1e18
          LRG_999       19   42293935-42294089      + |    LRG_999t1e19
          LRG_999       19   42294173-42294304      + |    LRG_999t1e20
          LRG_999       19   42294604-42294735      + |    LRG_999t1e21
          LRG_999       19   42294824-42295796      + |    LRG_999t1e22
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome
> exby2$gene_id <- names(exby2)
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 8 out-of-bound ranges located on sequence
  LRG_432. Note that ranges located on a sequence whose length is unknown
  (NA) or on a circular sequence are not considered out-of-bound (use
  seqlengths() and isCircular() to get the lengths and circularity flags
  of the underlying sequences). You can use trim() to trim these ranges.
  See ?`trim,GenomicRanges-method` for more information.
> gnby <- splitAsList(exby2, exby2$exon_id)
> gnby
GRangesList object of length 888642:
$ENSE00000000001
GRanges object with 1 range and 2 metadata columns:
                  seqnames            ranges strand |         exon_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSG00000244623        7 99876062-99877033      - | ENSE00000000001
                          gene_id
                      <character>
  ENSG00000244623 ENSG00000244623
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

$ENSE00000000002
GRanges object with 1 range and 2 metadata columns:
                  seqnames          ranges strand |         exon_id
                     <Rle>       <IRanges>  <Rle> |     <character>
  ENSG00000175485       11 6199224-6200186      + | ENSE00000000002
                          gene_id
                      <character>
  ENSG00000175485 ENSG00000175485
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

$ENSE00000000003
GRanges object with 1 range and 2 metadata columns:
                  seqnames            ranges strand |         exon_id
                     <Rle>         <IRanges>  <Rle> |     <character>
  ENSG00000148090        9 91361628-91361918      - | ENSE00000000003
                          gene_id
                      <character>
  ENSG00000148090 ENSG00000148090
  -------
  seqinfo: 457 sequences (1 circular) from GRCh38 genome

...
<888639 more elements>


## BORING!
> table(lengths(gnby))

     1 
888642