This gives me the result:

Question

transcripts missing from tx2gene

0

Entering edit mode

Erick • 0

@3ce888e4

Last seen 2.4 years ago

United States

Hi All, I am learning how to analyze bacterial RNAseq data and I am having difficulties using tximport. Any help will be greatly appreciated. I generated a transcriptome file from a gff file available and used it to align transcripts. The same transcriptome was used to perform Salmon analysis to generate quant files. I then generated a TxDb file using the same gff file that was used to make the transcriptome file above. When I however try to import the transcripts into tx2gene, I have 5015 transcripts missing from tx2gene. Here are the codes used.

#Generating a transcriptome fasta:

gffread -w transcripts.fasta -g NC_003198.fasta NC_003198.gff

#salmon analysis
'''conda activate salmon 
  salmon quant \
  -t ~/transcripts.fasta \
  -l A \
  -a ${file} \
  -o ${output} \
  --threads 8 
  conda deactivate  
done'''

#importing transcripts
'''library(readr)
quants <- read_tsv(quant_files[1])

library(GenomicFeatures)
txdb <- makeTxDbFromGFF("genome_data/NC_003198.gff.gz", format="gff3", dataSource = "NCBI")

k <- keys(txdb, keytype="TXNAME")
tx_map <- select(txdb, keys = k, columns="GENEID", keytype = "TXNAME")

library(tximport)
tx2gene <- tx_map
write.csv(tx2gene,file="tx2gene.csv",row.names = FALSE,quote=FALSE)
txi <- tximport(quant_files,type="salmon",tx2gene = tx2gene,ignoreTxVersion = TRUE)'''

Session info:

This gives me the result:

reading in files with read_tsv
1 2 3 
removing duplicated transcript rows from tx2gene
transcripts missing from tx2gene: 5015
summarizing abundance
summarizing counts
summarizing length

Many thanks in advance. Erick

tximport • 2.2k views

ADD COMMENT • link 2.4 years ago Erick • 0

score 1 · Accepted Answer · 2022-09-21

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

That seems to be something you will have to track down yourself. One possibility is mis-matching transcript IDs between what you aligned against and what you have in your TxDb package (5015 seems like it might be a preponderance of the transcripts for that species?).

It does not seem like a problem with the tximport package, or how you are using it.