News:Gencode GFF3 and FASTA files now available via AnnotationHub
0
2
Entering edit mode
Sonali Arora ▴ 390
@sonali-arora-6563
Last seen 8.7 years ago
United States

GFF3 and FASTA files from the latest release of  Gencode  are now available via AnnotationHub. (biocVersion 3.2 only) 

One can access GFF3 and FASTA files from the latest release of Homo sapiens (release 23) using the following code snippet :

> library(AnnotationHub)
> ah = AnnotationHub()
snapshotDate(): 2015-08-26
> Human_gff = query(ah, c("Gencode", "gff", "human"))
> Human_gff
AnnotationHub with 9 records
# snapshotDate(): 2015-08-26
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH49554"]]'

            title
  AH49554 | gencode.v23.2wayconspseudos.gff3.gz
  AH49555 | gencode.v23.annotation.gff3.gz
  AH49556 | gencode.v23.basic.annotation.gff3.gz
  AH49557 | gencode.v23.chr_patch_hapl_scaff.annotation.gff3.gz
  AH49558 | gencode.v23.chr_patch_hapl_scaff.basic.annotation.gff3.gz
  AH49559 | gencode.v23.long_noncoding_RNAs.gff3.gz
  AH49560 | gencode.v23.polyAs.gff3.gz
  AH49561 | gencode.v23.primary_assembly.annotation.gff3.gz
  AH49562 | gencode.v23.tRNAs.gff3.gz

> Human_fasta = query(ah, c("Gencode", "fasta", "human"))
> Human_fasta
AnnotationHub with 5 records
# snapshotDate(): 2015-08-26
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: FaFile
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
#   sourcetype
# retrieve records with, e.g., 'object[["AH49563"]]'

            title
  AH49563 | gencode.v23.chr_patch_hapl_scaff.transcripts.fa.gz
  AH49564 | gencode.v23.lncRNA_transcripts.fa.gz
  AH49565 | gencode.v23.pc_transcripts.fa.gz
  AH49566 | gencode.v23.pc_translations.fa.gz
  AH49567 | gencode.v23.transcripts.fa.gz

To access information about the file, use the '[' operator and use the '[[' to download the file. 

> ah["AH49562"]
AnnotationHub with 1 record
# snapshotDate(): 2015-08-26
# names(): AH49562
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: gencode.v23.tRNAs.gff3.gz
# $description: tRNA structures predicted by tRNA-Scan on reference chromosomes
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GFF
# $sourceurl: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_23/ge...
# $sourcelastmodifieddate: 2015-07-16
# $sourcesize: 17419
# $tags: gencode, v23, tRNAs, gff3
# retrieve record with 'object[["AH49562"]]'
> gff = ah[["AH49562"]]
require(“rtracklayer”)
retrieving 1 resource
  |======================================================================| 100%

 

The GFF3 files are downloaded and read into R as a GenomicRanges object, while the FASTA files are indexed and both the Fasta file and its index are returned as a 'FaFile' object. 

> class(gff)
[1] "GRanges"
attr(,"package")
[1] "GenomicRanges"

> fas = ah[["AH49567"]]
retrieving 2 resources
  |======================================================================| 100%
  |======================================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)
> class(fas)
[1] "FaFile"
attr(,"package")
[1] "Rsamtools"
> fas
class: FaFile
path: /home/sarora/.AnnotationHub/56291
index: /home/sarora/.AnnotationHub/56292
isOpen: FALSE
yieldSize: NA

Similarly, Gencode GFF3 and FASTA files for current Mouse release ( M6 ) can be accessed with : 

> Mouse_gff = query(ah, c("Gencode", "gff", "mouse"))
> Mouse_fasta = query(ah, c("Gencode", "fasta", "mouse"))

> packageVersion('AnnotationHub')
[1] ‘2.1.40’

 

Sonali. 

AnnotationHub Gencode GFF3 FASTA News • 3.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6