I am using tximeta to prepare Salmon output for use with DeSeq2. I am able to read in the files and sample names, but receive the following error when running tximeta:
se <- tximeta(coldata)
Error in FUN(X[[i]], ...): metadata files are missing, tximeta requires the full Salmon output directory
Traceback:
However for this to work, you cannot have deleted the original directory output by Salmon. If you moved the quant.sf files and deleted the metadata, this won't work anymore.
I did indeed move the quant.sf files. I can move them back into their original folders. To be sure, are the 3 file types required by tximeta 1) quant.sf, 2) a log file ending in tar.gz, and 3) a bam file? These are the file types output by Salmon into each of my participant directories and I'd like to make sure these are the files tximeta is looking for.
What is the strategy for pointing tximeta to the correct directories and files? I am a bit confused because your vignette seems to point only to the quant.sf files and not to an entire Salmon output directory (i.e., the line of code that reads files <- file.path(dir, "SRR1197474", "quant.sf")). Do I need to point to each individual participant folder containing the 3 file types I described above? Or do I need to create a tarball containing all of the required files? If the latter, is this what the "tar -czf" code does and how is this implemented? For example, would I create a tarball containing all of the salmon output folders with each of the 3 above files per participant?
I hope these questions are clear and thank you very much in advance.
I ran salmon using a prebuilt workflow on a cloud server my institution subscribes to. The workflow runs the salmon quant command using a salmon index (I chose hg38 Ensembl) and fastq files as inputs. The outputs are the 3 files I described (quant.sf, log file, and bam file). If you can please let me know which outputs I am missing, I would be grateful!
Re. pointing to files, the vignette points only to the quant.sf files; I was curious how the package would "see" the other files required by tximeta.
Tximeta uses the full directory. There are files that give the key information about the transcriptome. It's not a good idea to selectively delete output files, just keep the directory intact as many downstream tools including tximeta, MultiQC, etc. expect key information in these output files.
If you don't have the output of Salmon, you can just use skipMeta=TRUE or use tximport.
For each sample (.) an output folder is created that contains 3 files (cmd_info.json, lib_format_counts.json, quant.sf), as well as 3 sub-folders (aux_info, libParams, logs) + folders & files in them. The whole dataset is nicely imported by tximeta (and parsed by MultiQC).
What is the content of your
coldata
? In other words, what is the output ofhead(coldata)
?Also, does
coldata
contain the 2 columns withcolnames
"files
" and "names
"? These 2 column names are required! See: https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Tximeta_import_starts_with_sample_table--> based on the error it seems the
salmon
input files cannot be found. And no, no need to point to agtf
file.Hi, here are the contents of coldata:
I do have both the "names" and "files" columns in coldata. The files column points to the salmon files, so I am not sure what is missing or incorrect.
Thanks again for any further guidance you can offer!