Hello Folks,
I generated quant.sf file with Salmon tool and the next step is to Import the transcripts abundance dataset with tximport. I generated the file.csv using the same annotation file used in salmon,
> head(tx2gene)
TXNAME GENEID
1 ENST00000456328.2 ENSG00000223972.4
2 ENST00000515242.2 ENSG00000223972.4
3 ENST00000518655.2 ENSG00000223972.4
4 ENST00000450305.2 ENSG00000223972.4
5 ENST00000473358.1 ENSG00000243485.2
6 ENST00000469289.1 ENSG00000243485.2
Here is the output from a quant.sf file,
cat quant.sf | head -n 3
Name Length EffectiveLength TPM NumReads
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript| 1657 1513.346 0.000000 0.000
ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| 632 488.811 17.921214 1.000
When I launch the lst script I get that:
txi <- tximport(files, type="salmon", tx2gene=tx2gene)
> reading in files with read_tsv
1 2 3 4 5 6
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar, :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|, ...]
Example IDs (tx2gene): [ENST00000456328.2, ENST00000515242.2, ENST00000518655.2, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.
I know that this problem was faced from other people but I couldn't find the solution for my case, do you have any suggestion about what should I change?
And also I have another quesiton, why is needed to use the file.csv? at the end has only the same gene ID of my quant.sf file
Thank you
I'm not sure if it's read_tsv that is wrong since I don't have tsv file or there is something more required and related to summarizeToGene function
it says this as well,
. but I used the same annotation...
reading here: We can avoid gene-level summarization by setting
txOut=TRUE
, giving the original transcript level estimates as a list of matricesI changed my command line to
txi.salmon <- tximport(files, type="salmon", tx2gene=tx2gene, txOut=TRUE)
and I don't have error anymore but I don't know if the output that I get is correct to go to DESeq2
Can you tell me that please?
Thank you
hi Merlin,
Over the past couple of interactions, I feel like you're not taking the time to double check your work and read relevant messages.
It says above very clearly that the gene IDs in the file look like "ENST00000456328.2|..." while the gene IDs in the tx2gene table look like "ENST00000456328.2".
The difference is that there is a bunch of extra characters in the quantification files. The IDs need to be the same for the matching of transcripts to genes to work.
Furthermore, we have built a solution for this already, to "ignore after bar", by setting
ignoreAfterBar=TRUE
.And the message that the software prints to the consolue even goes to tell you that you should try this solution and that it may solve your problem.
Please take the time to try to solve these problems on your end before immediately posting for further help from maintainers that are already busy.
Thank you for you answer Michael, Yes It’s at least three days that I m checking my work, and I have also tried to put the two messages indicated in the output but it didn’t work because I ddin’t use the complete command =TRUE. Slowly I’m learning everything
I’m sorry for taking your time, if you consider that is a low level question please don’t answer, that’s my level.
At the end it works , I appreciated
Thank you