Entering edit mode
I would like to make a TxDb package from a GFF file using GenomicFeatures, but can't get it to work. Below is a reproducible example on a small subset.
Retrieve the GFF file:
gff.file <- "Vitis_vinifera_annotation.gff.gz" url <- paste0("https://urgi.versailles.inra.fr/content/download/2157/19376/file/", gff.file) cmd <- paste0("wget ", url, " ", gff.file) system(cmd)
Extract a small subset:
gff.file.small <- "subset.gff" cmd <- paste0("zcat ", gff.file, " | grep -w 'chr2' | head -100 > ", gff.file.small) system(cmd)
Make a txdb object:
library(GenomicFeatures) library(BSgenome.Vvinifera.URGI.IGGP12Xv0) txdb <- makeTxDbFromGFF(file=gff.file.small, format="auto", dataSource=url, organism="Vitis vinifera", taxonomyId=29760, chrominfo=seqinfo(BSgenome.Vvinifera.URGI.IGGP12Xv0)) txdb # shows transcript_nrow=0, exon_nrow=0, etc length(tmp <- transcripts(txdb)) # 0
Is it because the initial GFF file is badly formatted?
Just jumping in; alternatively you could try to get a GFF3 or a GTF file from Ensembl plants, e.g.
ftp://ftp.ensemblgenomes.org/pub/plants/current/gff3/vitis_vinifera
(for other versions than "current" just browse the ftp)
By the way, if you're working with Ensembl annotations you could also consider to give a quick glance to the ensembldb package. The EnsDb objects from that package provide a similar (almost the same) functionality than the TxDb objects. Also, you have the
ensDbFromGtf
andensDbFromGff
methods to create such an EnsDb from a GTF or GFF3; ideally check out the current devel version of the package (will be released soon with Bioc 3.3).cheers, jo
@MichaelLawrence Thanks, I will (try to) figure out a way to convert the GFF2 file into GFF3
@Johannes Rainer I am aware that I can retrieve annotations at Ensembl, but it happens that I specifically want these, which may be a bit different than the ones at Ensembl, which is something that I should indeed check at some point
From here: https://urgi.versailles.inra.fr/Species/Vitis/Annotations
It looks like there are GFF3 annotations under the "V1" heading. It's only V0 that are GFF2.
@Michael Lawrence yes, but I would like the V0, too. (I'll ask another question about makeTxDbFromGFF because I don't even understand how it works on the example given in the official specification.)