Entering edit mode
I am trying to use bambu with a nanopore cDNA sequencing dataset. I tried to run with bambu and have some questions :
- Does the annotation file has to be in gtf? I have an annotation gff3 file. When I make into txDb, it can generate txdb, but, when I run the annotations command it output this way :
# create annotation
annotations <- prepareAnnotations("./refgenome/tn2.gtf")
#output :
transcript names are not unique,
only one transcript per ID will be kept
Error in .local(x, ..., value = value) :
1587 rows in value to replace 1574rows
Hence, I tried to change into gtf format using gffread
- When using the command se.Multisample, it showed :
Detected 6 warnings across the samples during read class construction. Access warnings with metadata(bambuOutput)$warnings
--- Start extending annotations ---
WARNING - Less than 50 TRUE or FALSE read classes for NDR precision stabilization.
NDR will be approximated as: (1 - Transcript Model Prediction Score)
A high NDR threshold is being recommended by Bambu indicating high levels of novel transcripts, limiting the performance of the trained model
We recommend training a new model on similiar but well annotated dataset if available (https://github.com/GoekeLab/bambu/tree/master#Training-a-model-on-another-speciesdataset-and-applying-it), or alternatively running Bambu with opt.discovery=list(fitReadClassModel=FALSE)
Using a novel discovery rate (NDR) of: 1
--- Start isoform quantification ---
--- Finished running Bambu ---
However, when I checked the output, it only has the RNA annotation (no CDS or else).
> head(se.multiSample)
class: RangedSummarizedExperiment
dim: 6 3
metadata(2): incompatibleCounts warnings
assays(4): counts CPM fullLengthCounts uniqueCounts
rownames(6): BambuTx1 rna-DO80_r001 ... rna-DO80_r004
rna-DO80_r005
rowData names(11): TXNAME GENEID ... txid eqClassById
colnames(3): b04_sorted b05_sorted b06_sorted
colData names(1): name
I don't know where the problem is, is it because of my annotation file? What should I run differently? Thanks!
Even if I input for counting only (no discovery) :
the output is still the same :
it wont output other transcript counts (only rna coded ones)
Here is my txdb command :
I checked but it looks OK, I don't know what's wrong.. and when I do the prepareannotations from bambu :
any idea??
Hi,
Bambu doesn't output or use CDS because it does not do ORF prediction and focuses on outputing the exon content of transcripts. If you need the CDS sequences you could use a downstream ORF prediction tool on the output.
How many transcript are you trying to quantify? (Or better said, how many transcripts are in your gff3 file before you convert it)? Are you transcripts you want to quantify in your coverted gtf file? It could be that converting your gff3 to gtf has not provided appropriate exon lines that bambu expects in the gtf file. Could you provide an example of some of the lines from the gff3 and converted gtf file for me to have a look at?
There are about 1500 annotated transcripts. Here I will show the gff3 :
and here is the gtf file :
Even if I tried to make txdb from GFF (as the said above) also outputs only rna
Hi,
Ah I think the issue is the gtf file is missing the exon line. Here is an excerpt from the toy gtf file that comes with bambu.
Without the exon line bambu does not know where the intron junctions are and therefore cannot quantify the transcripts. From the scaffold id you have in the gtf I notice this is for a bacterial species which would not have much splicing and likely why there is no exon lines in the gtf. Bambu has been evaluated and tested on mainly complexly spliced transcriptomes and there are a few limitations with bacteria genomes.
As a work around for your gtf problem you could try replacing "transcript" in the 3rd column of the GTF with "exon", I am not sure if that might introduce other issues though... If you get this to work and you want to do transcript discovery you need to set
opt.discovery = list(min.txScore.singleExon=0)
when running bambu, which turns on discovery for unspliced transcripts. But just to reiterate, Bambu hasn't been evaluated in a bacteria/microbial context.Hope this helps.