Hi All, I am learning how to analyze bacterial RNAseq data and I am having difficulties using tximport. Any help will be greatly appreciated. I generated a transcriptome file from a gff file available and used it to align transcripts. The same transcriptome was used to perform Salmon analysis to generate quant files. I then generated a TxDb file using the same gff file that was used to make the transcriptome file above. When I however try to import the transcripts into tx2gene, I have 5015 transcripts missing from tx2gene. Here are the codes used.
#Generating a transcriptome fasta:
gffread -w transcripts.fasta -g NC_003198.fasta NC_003198.gff
#salmon analysis
'''conda activate salmon
salmon quant \
-t ~/transcripts.fasta \
-l A \
-a ${file} \
-o ${output} \
--threads 8
conda deactivate
done'''
#importing transcripts
'''library(readr)
quants <- read_tsv(quant_files[1])
library(GenomicFeatures)
txdb <- makeTxDbFromGFF("genome_data/NC_003198.gff.gz", format="gff3", dataSource = "NCBI")
k <- keys(txdb, keytype="TXNAME")
tx_map <- select(txdb, keys = k, columns="GENEID", keytype = "TXNAME")
library(tximport)
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
txi <- tximport(quant_files,type="salmon",tx2gene = tx2gene,ignoreTxVersion = TRUE)'''
Session info:
This gives me the result:
reading in files with read_tsv
1 2 3
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 5015
summarizing abundance
summarizing counts
summarizing length
Many thanks in advance. Erick
Thanks James. I will try to find out.