Hi folks,
I want to do a comparison of RNA-Seq datasets of same tissues in multiple species. I quantified the RNA-Seq using kallisto and now I want to read the kallisto files using tximport for DESeq2 analysis. My abundance files have identical transcript id for each species but the transcript lengths differ due to differences in gene lengths in diferent species. When I try to read the files using Tximport I get following error.
> sample <- read.table("samples.txt", header = T) > files <- file.path(dir, "kallisto_results",sample$samplenames, "abundance.tsv") > txt2gene <- read.table("Transcript-to-Gene-mapping.txt", header=T, sep = "\t") > txi.kallisto <- tximport(files, type="kallisto", tx2gene = txt2gene) reading in files 1 2 3 Error: all(txId == raw[[txIdCol]]) is not TRUE
The first two are for same species and the 3rd one with the error is for different species. Can someone please help me with this? I am really stuck as I cannot find out what this error mean. My guess is that this is due to difference in transcript length for the same Id in different species. But I am not sure!
Please let me know if you know what's going on here?
Thanks so much in advance!
- Urja
Thanks Michael for your prompt response! It worked when I sorted the genes in the same order. I have 2 quick follow up questions:
1. Are the kallisto abundance estimates sorted in a particular order?
2. Would it affect the DESeq analysis if I sort the genes alphabetically?
I suppose the answer to both the questions is No?
Thanks again!
-Urja
I believe the kallisto abundances are sorted by the order in the transcriptome FASTA. But that's a kallisto question.
The order of the genes makes no difference to the inference methods.
If you're using kallisto to quantify your sample I'd recommend abandoning tximport -> DESeq2 and using sleuth instead. It's more accurate. See https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4324.html.