Error running tximeta
1
0
Entering edit mode
@16ab65d3
Last seen 4 months ago
United States

Hi,

I am using tximeta to prepare Salmon output for use with DeSeq2. I am able to read in the files and sample names, but receive the following error when running tximeta:

se <- tximeta(coldata)

Error in FUN(X[[i]], ...): metadata files are missing, tximeta requires the full Salmon output directory Traceback:

  1. tximeta(coldata)
  2. lapply(files, getMetaInfo, customMetaInfo = customMetaInfo)
  3. FUN(X[[i]], ...)
  4. stop("metadata files are missing, tximeta requires the full Salmon output directory")

I am unsure what metadata is needed that is not already provided. Do I need to point to a gtf file?

Thanks in advance!

tximeta • 1.1k views
ADD COMMENT
0
Entering edit mode

What is the content of your coldata? In other words, what is the output of head(coldata)?

Also, does coldata contain the 2 columns with colnames "files" and "names"? These 2 column names are required! See: https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#Tximeta_import_starts_with_sample_table

--> based on the error it seems the salmon input files cannot be found. And no, no need to point to a gtf file.

ADD REPLY
0
Entering edit mode

Hi, here are the contents of coldata:

enter image description here

I do have both the "names" and "files" columns in coldata. The files column points to the salmon files, so I am not sure what is missing or incorrect.

Thanks again for any further guidance you can offer!

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

tximeta does more than tximport -- it also identifies the provenance of the transcriptome using metadata that Salmon outputs.

https://github.com/thelovelab/tximeta?tab=readme-ov-file#how-it-works

However for this to work, you cannot have deleted the original directory output by Salmon. If you moved the quant.sf files and deleted the metadata, this won't work anymore.

You can either use tximport or set skipMeta=TRUE.

ADD COMMENT
0
Entering edit mode

Thank you, Michael,

If I can follow up with a couple of questions:

  1. I did indeed move the quant.sf files. I can move them back into their original folders. To be sure, are the 3 file types required by tximeta 1) quant.sf, 2) a log file ending in tar.gz, and 3) a bam file? These are the file types output by Salmon into each of my participant directories and I'd like to make sure these are the files tximeta is looking for.

  2. What is the strategy for pointing tximeta to the correct directories and files? I am a bit confused because your vignette seems to point only to the quant.sf files and not to an entire Salmon output directory (i.e., the line of code that reads files <- file.path(dir, "SRR1197474", "quant.sf")). Do I need to point to each individual participant folder containing the 3 file types I described above? Or do I need to create a tarball containing all of the required files? If the latter, is this what the "tar -czf" code does and how is this implemented? For example, would I create a tarball containing all of the salmon output folders with each of the 3 above files per participant?

I hope these questions are clear and thank you very much in advance.

ADD REPLY
0
Entering edit mode

For how to point tximeta to files, please refer to the vignette which has examples.

These are the file types output by Salmon into each of my participant directories

There are more files. Did you yourself run Salmon or did you get files from another source?

ADD REPLY
0
Entering edit mode

I ran salmon using a prebuilt workflow on a cloud server my institution subscribes to. The workflow runs the salmon quant command using a salmon index (I chose hg38 Ensembl) and fastq files as inputs. The outputs are the 3 files I described (quant.sf, log file, and bam file). If you can please let me know which outputs I am missing, I would be grateful!

Re. pointing to files, the vignette points only to the quant.sf files; I was curious how the package would "see" the other files required by tximeta.

ADD REPLY
0
Entering edit mode

Tximeta uses the full directory. There are files that give the key information about the transcriptome. It's not a good idea to selectively delete output files, just keep the directory intact as many downstream tools including tximeta, MultiQC, etc. expect key information in these output files.

If you don't have the output of Salmon, you can just use skipMeta=TRUE or use tximport.

ADD REPLY
0
Entering edit mode

FYI (since I just finalized a run with salmon).

For each sample (.) an output folder is created that contains 3 files (cmd_info.json, lib_format_counts.json, quant.sf), as well as 3 sub-folders (aux_info, libParams, logs) + folders & files in them. The whole dataset is nicely imported by tximeta (and parsed by MultiQC).

.
  |-aux_info
  |  |-ambig_info.tsv
  |  |-bootstrap
  |  |  |-bootstraps.gz
  |  |  |-names.tsv.gz
  |  |-exp3_seq.gz
  |  |-exp5_seq.gz
  |  |-expected_bias.gz
  |  |-exp_gc.gz
  |  |-fld.gz
  |  |-meta_info.json
  |  |-obs3_seq.gz
  |  |-obs5_seq.gz
  |  |-observed_bias.gz
  |  |-observed_bias_3p.gz
  |  |-obs_gc.gz
  |-cmd_info.json
  |-libParams
  |  |-flenDist.txt
  |-lib_format_counts.json
  |-logs
  |  |-salmon_quant.log
  |-quant.sf

For completeness: content of cmd_info.json (listing all arguments used to run salmon for this particular sample).

{
    "salmon_version": "1.10.0",
    "index": "/mnt/files/guido/SALMON/index_hs_46/",
    "libType": "A",
    "seqBias": [],
    "numBiasSamples": "10000000",
    "gcBias": [],
    "biasSpeedSamp": "5",
    "validateMappings": [],
    "numGibbsSamples": "100",
    "threads": "20",
    "mates1": "./clean/P24D1/P24D1_1.fq.gz",
    "mates2": "./clean/P24D1/P24D1_2.fq.gz",
    "output": "./data_out/salmon_out/P24D1",
    "auxDir": "aux_info"
}
ADD REPLY

Login before adding your answer.

Traffic: 956 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6