Question

Error running makeTxDbFromGFF

0

Entering edit mode

jmmonroykuhn • 0

@jmmonroykuhn-9144

Last seen 5.6 years ago

Germany

I am trying to construct a TxDb from a gff3 but it fails and I got the following error message:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in as.vector(x, mode = "character") :
coercing an AtomicList object to an atomic vector is supported only for
objects with top-level elements of length <= 1

What can be the reason? My gff3 is not from a model species.

sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] Rsamtools_1.22.0 Biostrings_2.38.0 XVector_0.10.0 GenomicFeatures_1.22.4 AnnotationDbi_1.32.0
[6] Biobase_2.30.0 GenomicRanges_1.22.1 GenomeInfoDb_1.6.1 IRanges_2.4.1 S4Vectors_0.8.1
[11] BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0            GenomicAlignments_1.6.1    BiocParallel_1.4.0         tools_3.2.2
[5] SummarizedExperiment_1.0.1 DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1
[9] rtracklayer_1.30.1         futile.options_1.0.0       bitops_1.0-6               RCurl_1.95-4.7
[13] biomaRt_2.26.0             RSQLite_1.0.0              XML_3.98-1.3

genomicfeatures • 2.3k views

ADD COMMENT • link updated 9.0 years ago by Keith Hughitt ▴ 180 • written 9.0 years ago by jmmonroykuhn • 0

score 0 · Answer 1 · 2015-11-11

0

Entering edit mode

Keith Hughitt ▴ 180

@keith-hughitt-6740

Last seen 9 months ago

United States

Hi Jmmonroykuhn,

Can you post the code for the function call you used, and also the top & bottom ~10 lines from your GFF? If the GFF is available online, you can also post a link to it.

ADD COMMENT • link 9.0 years ago Keith Hughitt ▴ 180

0

Entering edit mode

This is the command I used

gtffile <- file.path(dir_csec,"Cryptotermes_secundus.gff")
(txdb <- makeTxDbFromGFF(gtffile, format="gff3", circ_seqs=character()))

This is how the first 10 lines look

##gff-version 3

##_######################
## GENOME

#some more comments...

##
C37016610       GffMM   gene    1       234     .       -       .       ID=Csec_G00001;Name=Csec_G00001;alias=csec_17774_G
C37016610       GeneWise        transcript      1       234     50.65   -       .       ID=Csec_G00001_T1;Parent=Csec_G00001;shift=0;alias=csec_17774;evid_id=CG9836-PA_DROME-D4;identical_support_id=CUFF.152.1;nb_exon=1;part_support_id=CUFF.3355.1;source_id=CCG014480.1
C37016610       GeneWise        CDS     1       234     .       -       0       ID=Csec_G00001_T1_C1;Parent=Csec_G00001_T1
##
C37651885       GffMM   gene    29      433     .       +       .       ID=Csec_G00002;Name=Csec_G00002;alias=csec_17775_G
C37651885       GeneWise        transcript      29      433     72      +       .       ID=Csec_G00002_T1;Parent=Csec_G00002;shift=0;alias=csec_17775;evid_id=GB18432-PA_APIME-D6;identical_support_id=CUFF.152.1;nb_exon=1;part_support_id=CUFF.3355.1;source_id=CCG014480.1
C37651885       GeneWise        CDS     29      433     .       +       0       ID=Csec_G00002_T1_C1;Parent=Csec_G00002_T1
##
C37688897       GffMM   gene    75      380     .       +       .       ID=Csec_G00003;Name=Csec_G00003;alias=csec_17776_G

This is how the last 10 lines look

##
scaffold29346   PASA    gene    1       3199    .       -       .       ID=Csec_G19035;Name=PASA_cluster_69985,Csec_G19035;alias=PASA_cluster_69985
scaffold29346   PASA    transcript      1       3199    .       -       .       ID=Csec_G19035_T1;Parent=Csec_G19035;Name=asmbl_81646;nb_exon=4;size=3199;alias=asmbl_81646
scaffold29346   PASA    exon    1       137     .       -       .       ID=Csec_G19035_T1_E4;Parent=Csec_G19035_T1;size=137;exon_index=4
scaffold29346   OrfPred three_prime_UTR 1       1       .       -       .       ID=Csec_G19035_T1_U1;Parent=Csec_G19035_T1
scaffold29346   OrfPred CDS     2       119     .       -       0       ID=Csec_G19035_T1_C1;Parent=Csec_G19035_T1;total_size=118
scaffold29346   OrfPred five_prime_UTR 120     137     .       -       .       ID=Csec_G19035_T1_U2;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2342    2468    .       -       .       ID=Csec_G19035_T1_E3;Parent=Csec_G19035_T1;size=127;exon_index=3
scaffold29346   OrfPred five_prime_UTR 2342    2468    .       -       .       ID=Csec_G19035_T1_U3;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2587    2829    .       -       .       ID=Csec_G19035_T1_E2;Parent=Csec_G19035_T1;size=243;exon_index=2
scaffold29346   OrfPred five_prime_UTR 2587    2829    .       -       .       ID=Csec_G19035_T1_U4;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2925    3199    .       -       .       ID=Csec_G19035_T1_E1;Parent=Csec_G19035_T1;size=275;exon_index=1
scaffold29346   OrfPred five_prime_UTR 2925    3199    .       -       .       ID=Csec_G19035_T1_U5;Parent=Csec_G19035_T1
##

ADD REPLY • link 9.0 years ago jmmonroykuhn • 0

0

Entering edit mode

The GFF and function call look okay at first glance. You could try using an online GFF validator (e.g. http://genometools.org/cgi-bin/gff3validator.cgi) to make sure the format is okay.

My first thought is that it might be the empty vector passed to the `circ_seqs` arg. The default value for that argument is:

> DEFAULT_CIRC_SEQS
 [1] "chrM"                      "MT"                        "MtDNA"                    
 [4] "mit"                       "Mito"                      "mitochondrion"            
 [7] "dmel_mitochondrion_genome" "Pltd"                      "ChrC"                     
[10] "Pt"                        "chloroplast"               "Chloro"                   
[13] "2micron"                   "2-micron"                  "2uM"

So as long as none of your chromosomes have the same names and are linear (I would guess not..), then you can just leave the argument as is.

Does it work if you leave out the circ_seqs arg?

ADD REPLY • link 9.0 years ago Keith Hughitt ▴ 180

0

Entering edit mode

Do you think you could put the entire file somewhere online? That would make this much easier.

ADD REPLY • link 9.0 years ago Michael Lawrence ★ 11k