Dear all
I have been struggling for several hours with this, and after trying to follow all indications of the Genomic Features package and related help/forums, I am still stuck:
a gff3 file and gtf are my only annotation ressource for Ectocarpus siliculosus (https://bioinformatics.psb.ugent.be/gdb/ectocarpus/EctsiV2_gff3_LATEST.tar.gz) and
(https://bioinformatics.psb.ugent.be/gdb/ectocarpus/EctsiV2_gtf_LATEST.tar.gz), and I want to forge a TxDb package for use in several Bioconductor applications.
I should maybe add here that annotation are organised in supercontigs sctgs and not Chr (can it be a source of problem?)
Untared Gff3 file is in fact split into a gff3 file for each supercontig, and is not accepted by makeTxDb(). I was able to obtain a TxDb object with either the .gtf file, or a gff3 obtained with cufflink’s gffread tool. Those two txdb objects are different in term of number of transcript, exons and cds. Anyway makeTxDbpackage() give me always the same error "Error in spc[[2]] : subscript out of bounds" with any of those TxDb objects, please see below.
Can somebody have a look at this gtf file and tell me if the error is coming from the gtf file itself, or if I am missing something obvious here?
Seriously lost here, any help would be greatly appreciated!
Many Thanks,
Yacine
In Linux
gffread -E EctsiV2_all.gtf -o- > EctsiV2_all.gtf.gff3
In R
> txdb <- makeTxDbFromGFF("EctsiV2_all.gtf", format="gtf", circ_seqs=character())
Prepare the 'metadata' data frame ... metadata: OK
Warning message:
In .reject_transcripts(bad_tx, because) :
The following transcripts were rejected because they have CDSs that
cannot be mapped to an exon: Esi0003_0153.1, Esi0015_0061.1,
Esi0044_0024.1, Esi0074_0010.1, Esi0093_0006.1, Esi0098_0023.1,
Esi0117_0096.1, Esi0123_0031.1, Esi0165_0042.1, Esi0168_0075.1,
Esi0197_0020.1, Esi0205_0069.1, Esi0221_0029.1, Esi0264_0026.1,
Esi0279_0055.1, Esi0304_0028.1, Esi0364_0035.1, Esi0369_0025.1,
Esi0370_0030.1, Esi0376_0031.1, Esi0392_0017.1, Esi0423_0003.1,
Esi0445_0015.1, Esi0651_0011.1, Esi0772_0002.1, Esi0798_0001.1,
Esi1446_0002.1, Esi1480_0001.1, Esi1751_0001.1
> txdb2 <- makeTxDbFromGFF("EctsiV2_all.gtf.gff3", format="gff3", circ_seqs=character())
Prepare the 'metadata' data frame ... metadata: OK
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: EctsiV2_all.gtf
# Organism: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 18406
# exon_nrow: 162850
# cds_nrow: 136291
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-06-22 09:19:39 +0100 (Mon, 22 Jun 2015)
# GenomicFeatures version at creation time: 1.20.1
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> txdb2
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: EctsiV2_all.gtf.gff3
# Organism: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 18435
# exon_nrow: 139594
# cds_nrow: 136421
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2015-06-22 09:20:27 +0100 (Mon, 22 Jun 2015)
# GenomicFeatures version at creation time: 1.20.1
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1
> makeTxDbPackage()
Error in AnnotationDbi:::dbconn(txdb) :
error in evaluating the argument 'x' in selecting a method for function 'dbconn': Error: argument "txdb" is missing, with no default
> makeTxDbPackage(txdb=txdb)
> ls()
[1] "txdb" "txdb2"
> makeTxDbPackage(txdb=txdb, version="0.1", maintainer="<Yacine.Badis@sams.ac.uk>", author= "Yacine Badis", destDir=".", license= "Artistic-2.0")
Error in spc[[2]] : subscript out of bounds
> makeTxDbPackage(txdb=txdb2, version="0.1", maintainer="<Yacine.Badis@sams.ac.uk>", author= "Yacine Badis", destDir=".", license= "Artistic-2.0")
Error in spc[[2]] : subscript out of bounds
> makeTxDbPackage(txdb=txdb2)
Error in spc[[2]] : subscript out of bounds
> traceback()
4: paste0(substr(spc[[1]], 1, 1), spc[[2]])
3: .abbrevOrganismName(.getMetaDataValue(txdb, "Organism"))
2: .makePackageName(txdb)
1: makeTxDbPackage(txdb = txdb2)
>
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicFeatures_1.20.1 AnnotationDbi_1.30.1 Biobase_2.28.0
[4] GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 IRanges_2.2.4
[7] S4Vectors_0.6.0 BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.1 Rsamtools_1.20.4 Biostrings_2.36.1
[4] GenomicAlignments_1.4.1 bitops_1.0-6 futile.options_1.0.0
[7] DBI_0.3.1 RSQLite_1.0.0 zlibbioc_1.14.0
[10] XVector_0.8.0 futile.logger_1.4.1 lambda.r_1.1.7
[13] BiocParallel_1.2.3 tools_3.2.0 biomaRt_2.24.0
[16] RCurl_1.95-4.6 rtracklayer_1.28.5
>
I tried option 2, and package was successfully made!
Many thanks for your help Herve!
Yacine