tximeta on STAR-aligned Salmon-quantified output
Note: I haven't yet updated to R 4.0 or Bioconductor 3.11 yet in case that matters.

I built a STAR index using GRCm38 ensembl release 98 dna primary assembly and the release 98 gtf, and aligned with the --quantMode TranscriptomeSAM to get outputs in transcriptome coordinates. I then used gffread -w transcripts.fa -g $DNA $GTF to make a reference for salmon's alignment mode (salmon quant -t transcripts.fa -l A -a Sample.toTranscriptome.out.bam -o $OUTDIR) which seems to have worked.

Unfortunately, despite using unchanged references from ensembl, I can import it with tximeta, but it does not recognize the transcriptome and only provides a non-ranged SummarizedExperiment.

In the tximeta vignette sections about creating linked transcriptomes, everything seems to require a salmon index unless I'm missing something.

Can someone help me find a way that will allow use of the automatic metadata gathering feature of tximeta?

> library(tximeta)
> meta <- tibble::tribble(
+   ~names,        ~files,
+   "sample1", "results/quant/salmon/STAR/quant.sf"
+ )
> se <- tximeta(meta)
importing quantifications
reading in files with read_tsv
couldn't find matching transcriptome, returning non-ranged SummarizedExperiment


The way tximeta works is by reading the hash value that salmon index produces, however you can obtain this without running index. I do this for example when hashing GENCODE and Ensembl. Rob has a python package called fasta-digest you can install with pip that will produce the hash value as a standalone output.

Entering edit mode

Thanks for the response. I installed and used that tool by running compute_fasta_digest --reference transcripts.fa --out results/digest, renamed the file info.json to make it look like salmon index's output, then ran

makeLinkedTxome(indexDir = "results",
            source = "Ensembl", organism = "Mus musculus",
            release = "98", genome = "GRCm38",
            fasta = "transcripts.fa", gtf = "Mus_musculus.GRCm38.98.gtf",
            jsonFile = "results/index.json")

but it still doesn't work. Here's the output from my process:

> library(tximeta)
> makeLinkedTxome(indexDir = "results",
+                 source = "Ensembl", organism = "Mus musculus",
+                 release = "98", genome = "GRCm38",
+                 fasta = "transcripts.fa", gtf = "Mus_musculus.GRCm38.98.gtf",
+                 jsonFile = "results/index.json")
writing linkedTxome to results/index.json
  does not exist, create directory? (yes/no): yes
saving linkedTxome in bfc (first time)
> meta <- tibble::tribble(
+   ~names,        ~files,
+   "sample1", "results/quant/salmon/STAR/sample1/quant.sf"
+ )
> se <- tximeta(meta)
importing quantifications
reading in files with read_tsv
couldn't find matching transcriptome, returning non-ranged SummarizedExperiment
Entering edit mode

Because Ensembl 98 for mouse is supported you can skip making a linkedTxome.

You just need to insert this hash value into a file of your choosing in each sample directory (details in vignette but you’ll need to use the current release). You can use this argument:



