Note: I haven't yet updated to R 4.0 or Bioconductor 3.11 yet in case that matters.
I built a STAR index using GRCm38 ensembl release 98 dna primary assembly and the release 98 gtf, and aligned with the --quantMode TranscriptomeSAM
to get outputs in transcriptome coordinates. I then used gffread -w transcripts.fa -g $DNA $GTF
to make a reference for salmon's alignment mode (salmon quant -t transcripts.fa -l A -a Sample.toTranscriptome.out.bam -o $OUTDIR
) which seems to have worked.
Unfortunately, despite using unchanged references from ensembl, I can import it with tximeta, but it does not recognize the transcriptome and only provides a non-ranged SummarizedExperiment.
In the tximeta vignette sections about creating linked transcriptomes, everything seems to require a salmon index unless I'm missing something.
Can someone help me find a way that will allow use of the automatic metadata gathering feature of tximeta?
> library(tximeta)
> meta <- tibble::tribble(
+ ~names, ~files,
+ "sample1", "results/quant/salmon/STAR/quant.sf"
+ )
> se <- tximeta(meta)
importing quantifications
reading in files with read_tsv
couldn't find matching transcriptome, returning non-ranged SummarizedExperiment
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tximeta_1.4.5
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 lattice_0.20-41 prettyunits_1.1.1 Rsamtools_2.2.3 Biostrings_2.54.0
[6] assertthat_0.2.1 digest_0.6.25 BiocFileCache_1.10.2 R6_2.4.1 GenomeInfoDb_1.22.1
[11] stats4_3.6.3 RSQLite_2.2.0 httr_1.4.1 pillar_1.4.4 zlibbioc_1.32.0
[16] rlang_0.4.6 GenomicFeatures_1.38.2 progress_1.2.2 lazyeval_0.2.2 curl_4.3
[21] rstudioapi_0.11 blob_1.2.1 S4Vectors_0.24.4 Matrix_1.2-18 BiocParallel_1.20.1
[26] readr_1.3.1 stringr_1.4.0 ProtGenerics_1.18.0 RCurl_1.98-1.2 bit_1.1-15.2
[31] biomaRt_2.42.1 DelayedArray_0.12.3 compiler_3.6.3 rtracklayer_1.46.0 pkgconfig_2.0.3
[36] askpass_1.1 BiocGenerics_0.32.0 tximport_1.14.2 openssl_1.4.1 tidyselect_1.0.0
[41] SummarizedExperiment_1.16.1 tibble_3.0.1 GenomeInfoDbData_1.2.2 IRanges_2.20.2 matrixStats_0.56.0
[46] XML_3.99-0.3 fansi_0.4.1 crayon_1.3.4 dplyr_0.8.5 dbplyr_1.4.3
[51] GenomicAlignments_1.22.1 bitops_1.0-6 rappdirs_0.3.1 grid_3.6.3 jsonlite_1.6.1
[56] lifecycle_0.2.0 DBI_1.1.0 AnnotationFilter_1.10.0 magrittr_1.5 cli_2.0.2
[61] stringi_1.4.6 XVector_0.26.0 ellipsis_0.3.0 vctrs_0.2.4 ensembldb_2.10.2
[66] tools_3.6.3 bit64_0.9-7 Biobase_2.46.0 glue_1.4.0 purrr_0.3.4
[71] hms_0.5.3 parallel_3.6.3 AnnotationDbi_1.48.0 GenomicRanges_1.38.0 memoise_1.1.0
Thanks for the response. I installed and used that tool by running
compute_fasta_digest --reference transcripts.fa --out results/digest
, renamed the file info.json to make it look likesalmon index
's output, then ranbut it still doesn't work. Here's the output from my process:
Because Ensembl 98 for mouse is supported you can skip making a linkedTxome.
You just need to insert this hash value into a file of your choosing in each sample directory (details in vignette but you’ll need to use the current release). You can use this argument: