Hello there, I want to know what "extended_annotations.gtf is here (https://github.com/GoekeLab/sg-nex-data/blob/master/docs/SG-NEx_Bambu_tutorial.md#running-bambu)," one of BAMBU's main outputs. It sounds like extended_annotations.gtf is a file with the entire reference annotation plus all discovered novel transcripts. This is a size of about 200 Mb. What I am trying to get is something like "transcript_models.gtf" that has just constructed transcripts (both known and novel), so no entire reference annotation. To my knowledge, its size is 90 ~ 100 Mb. Is there a way to gain that filtered gtf through the command line?
I am using cDNA ONT and cDNA PacBio datasets. I am providing the command line I used to convert from .fastq file to bam file below for cDNA ONT in case I missed something.
./minimap2 -t 8 -ax splice /home/seong/R/x86_64-pc-linux-gnu-library/4.1/bambu/extdata/hg38.fa /data/long_read/ENCBS944CBA/ENCFF263YFG.fastq -o /data/long_read/ENCBS944CBA/ENCFF263YFG.sam
samtools view -@ 8 -Sb -o /data/long_read/ENCBS944CBA/ENCFF563QZR.bam /data/long_read/ENCBS944CBA/ENCFF563QZR.sam
One another question that I have is, does BAMBU detect intron retention? Let me know for these questions, thanks a lot!
hello Seongwoo, Did u able to run bambu successfully? I am facing technical problems so it would be great to get help.