Error running makeTxDbFromGFF
1
0
Entering edit mode
@jmmonroykuhn-9144
Last seen 5.6 years ago
Germany

I am trying to construct a TxDb from a gff3 but it fails and I got the following error message:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in as.vector(x, mode = "character") :
  coercing an AtomicList object to an atomic vector is supported only for
  objects with top-level elements of length <= 1

 

What can be the reason? My gff3 is not from a model species.

 

sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.04

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rsamtools_1.22.0       Biostrings_2.38.0      XVector_0.10.0         GenomicFeatures_1.22.4 AnnotationDbi_1.32.0  
 [6] Biobase_2.30.0         GenomicRanges_1.22.1   GenomeInfoDb_1.6.1     IRanges_2.4.1          S4Vectors_0.8.1       
[11] BiocGenerics_0.16.1   

loaded via a namespace (and not attached):
 [1] zlibbioc_1.16.0            GenomicAlignments_1.6.1    BiocParallel_1.4.0         tools_3.2.2               
 [5] SummarizedExperiment_1.0.1 DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1       
 [9] rtracklayer_1.30.1         futile.options_1.0.0       bitops_1.0-6               RCurl_1.95-4.7            
[13] biomaRt_2.26.0             RSQLite_1.0.0              XML_3.98-1.3             

genomicfeatures • 2.3k views
ADD COMMENT
0
Entering edit mode
Keith Hughitt ▴ 180
@keith-hughitt-6740
Last seen 9 months ago
United States

Hi Jmmonroykuhn,

Can you post the code for the function call you used, and also the top & bottom ~10 lines from your GFF? If the GFF is available online, you can also post a link to it.

ADD COMMENT
0
Entering edit mode

This is the command I used

gtffile <- file.path(dir_csec,"Cryptotermes_secundus.gff")
(txdb <- makeTxDbFromGFF(gtffile, format="gff3", circ_seqs=character()))

 

This is how the first 10 lines look

##gff-version 3

##_######################
## GENOME

#some more comments...

##
C37016610       GffMM   gene    1       234     .       -       .       ID=Csec_G00001;Name=Csec_G00001;alias=csec_17774_G
C37016610       GeneWise        transcript      1       234     50.65   -       .       ID=Csec_G00001_T1;Parent=Csec_G00001;shift=0;alias=csec_17774;evid_id=CG9836-PA_DROME-D4;identical_support_id=CUFF.152.1;nb_exon=1;part_support_id=CUFF.3355.1;source_id=CCG014480.1
C37016610       GeneWise        CDS     1       234     .       -       0       ID=Csec_G00001_T1_C1;Parent=Csec_G00001_T1
##
C37651885       GffMM   gene    29      433     .       +       .       ID=Csec_G00002;Name=Csec_G00002;alias=csec_17775_G
C37651885       GeneWise        transcript      29      433     72      +       .       ID=Csec_G00002_T1;Parent=Csec_G00002;shift=0;alias=csec_17775;evid_id=GB18432-PA_APIME-D6;identical_support_id=CUFF.152.1;nb_exon=1;part_support_id=CUFF.3355.1;source_id=CCG014480.1
C37651885       GeneWise        CDS     29      433     .       +       0       ID=Csec_G00002_T1_C1;Parent=Csec_G00002_T1
##
C37688897       GffMM   gene    75      380     .       +       .       ID=Csec_G00003;Name=Csec_G00003;alias=csec_17776_G

 

This is how the last 10 lines look

##
scaffold29346   PASA    gene    1       3199    .       -       .       ID=Csec_G19035;Name=PASA_cluster_69985,Csec_G19035;alias=PASA_cluster_69985
scaffold29346   PASA    transcript      1       3199    .       -       .       ID=Csec_G19035_T1;Parent=Csec_G19035;Name=asmbl_81646;nb_exon=4;size=3199;alias=asmbl_81646
scaffold29346   PASA    exon    1       137     .       -       .       ID=Csec_G19035_T1_E4;Parent=Csec_G19035_T1;size=137;exon_index=4
scaffold29346   OrfPred three_prime_UTR 1       1       .       -       .       ID=Csec_G19035_T1_U1;Parent=Csec_G19035_T1
scaffold29346   OrfPred CDS     2       119     .       -       0       ID=Csec_G19035_T1_C1;Parent=Csec_G19035_T1;total_size=118
scaffold29346   OrfPred five_prime_UTR  120     137     .       -       .       ID=Csec_G19035_T1_U2;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2342    2468    .       -       .       ID=Csec_G19035_T1_E3;Parent=Csec_G19035_T1;size=127;exon_index=3
scaffold29346   OrfPred five_prime_UTR  2342    2468    .       -       .       ID=Csec_G19035_T1_U3;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2587    2829    .       -       .       ID=Csec_G19035_T1_E2;Parent=Csec_G19035_T1;size=243;exon_index=2
scaffold29346   OrfPred five_prime_UTR  2587    2829    .       -       .       ID=Csec_G19035_T1_U4;Parent=Csec_G19035_T1
scaffold29346   PASA    exon    2925    3199    .       -       .       ID=Csec_G19035_T1_E1;Parent=Csec_G19035_T1;size=275;exon_index=1
scaffold29346   OrfPred five_prime_UTR  2925    3199    .       -       .       ID=Csec_G19035_T1_U5;Parent=Csec_G19035_T1
##

 

 

ADD REPLY
0
Entering edit mode

The GFF and function call look okay at first glance. You could try using an online GFF validator (e.g. http://genometools.org/cgi-bin/gff3validator.cgi) to make sure the format is okay.

My first thought is that it might be the empty vector passed to the `circ_seqs` arg. The default value for that argument is:

> DEFAULT_CIRC_SEQS
 [1] "chrM"                      "MT"                        "MtDNA"                    
 [4] "mit"                       "Mito"                      "mitochondrion"            
 [7] "dmel_mitochondrion_genome" "Pltd"                      "ChrC"                     
[10] "Pt"                        "chloroplast"               "Chloro"                   
[13] "2micron"                   "2-micron"                  "2uM"     

So as long as none of your chromosomes have the same names and are linear (I would guess not..), then you can just leave the argument as is.

Does it work if you leave out the circ_seqs arg?

ADD REPLY
0
Entering edit mode

Do you think you could put the entire file somewhere online? That would make this much easier.

ADD REPLY

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6