I am trying to use DEXSeq on the gtf output from stringtie but am getting an error when trying to create the flattened annotation file. Please find below an example of my commands and input data. Thanks in advance for any help with this issue.
The command I am using is:
python dexseq_prepare_annotation.py merged.gtf dexseq.gtf
Which errors with:
Traceback (most recent call last): File "dexseq_prepare_annotation.py", line 54, in <module> f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" ) KeyError: 'gene_id'
An example of what the gtf output from stringtie looks like is:
NW_006854668.1 StringTie transcript 2 301 1000 - . gene_id "MSTRG.195"; transcript_id "MSTRG.195.1"; NW_006854668.1 StringTie exon 2 301 1000 - . gene_id "MSTRG.195"; transcript_id "MSTRG.195.1"; exon_number "1"; NW_006854862.1 StringTie transcript 1 316 1000 - . gene_id "MSTRG.196"; transcript_id "MSTRG.196.1"; NW_006854862.1 StringTie exon 1 316 1000 - . gene_id "MSTRG.196"; transcript_id "MSTRG.196.1"; exon_number "1"; NW_006854920.1 StringTie transcript 6 317 1000 - . gene_id "MSTRG.197"; transcript_id "MSTRG.197.1"; NW_006854920.1 StringTie exon 6 317 1000 - . gene_id "MSTRG.197"; transcript_id "MSTRG.197.1"; exon_number "1"; NW_006855052.1 StringTie transcript 1 307 1000 + . gene_id "MSTRG.198"; transcript_id "rna3"; gene_name "LOC103159057"; ref_gene_id "gene5"; NW_006855052.1 StringTie exon 1 107 1000 + . gene_id "MSTRG.198"; transcript_id "rna3"; exon_number "1"; gene_name "LOC103159057"; ref_gene_id "gene5"; NW_006855052.1 StringTie exon 197 307 1000 + . gene_id "MSTRG.198"; transcript_id "rna3"; exon_number "2"; gene_name "LOC103159057"; ref_gene_id "gene5";
Hi,
Thanks for your answers. I am have the same problem in DEXseq.
Firstly, since I don't have the gtf file of honey bee, I use gffread to convert the gff file to the gtf file. however, I had the same problem
Then, I followed your comment and I convert back gff file:
./gffread -E /Users/LQS/Desktop/Apis_4.5_QS.gtf -o- > /Users/LQS/Desktop/DEXseq_2nd_Apis45.gff3
After that, I stored your perl script in R studio and changed the directory of the input file.
The input file was the one strored in .txt file format and specified the directory of the output gtf file which I want to use for the dexseq_prepare_annotation
I did like this:
open(F,"/Users/LQS/Desktop/DEXseq_2nd_Apis45.txt"); # Input file open(F1,">/Users/LQS/Desktop/data.gtf"); # Output gtf file
after running: perl XXX.pl
I used the data.gtf for the dexseq_annotation_prepare, the same error happened again:
Traceback (most recent call last): File "/pub40/acdguest/Qiushi/scripts/DEXseq_script/dexseq_prepare_annotation.py", line 54, in <module> f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" ) KeyError: 'gene_id'
Could you please help me on this?? Many thanks in advance.
Qiushi Liu