compute_fasta_digest does not match Salmon index hash
1
0
Entering edit mode
Alex • 0
@14d9d0d2
Last seen 7 hours ago
United States

Note sure where else to ask this, but I have been using tximport for importing Salmon quantifications against custom transcriptomes into R.

I don't always use salmon mapping mode for quantification, so by default I run compute_fasta_digest as was recommended to get the txomes Seq and Name hashes for when I generate a TxDb, but upon further inspection this does not seem to always work. Either the SeqHash or the NameHash from FastaDigest seems different than the salmon index, meaning my prebuilt linkedTxome and TxDb don't always work.

Does anybody know how to reliably compute the salmon hash values without fully indexing?

Salmon v1.10.0
fasta_digest v0.1.2
tximeta v1.16.0
R v4.2.3

tximeta • 170 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Can you provide more details on when this is happening? The salmon index is the definitive hash associated with the quant files, so I'm confused about what version you are computing manually. Is it always different or only with some sequence collections? It could have to do with duplicates perhaps?

ADD COMMENT
0
Entering edit mode

I use the fastaDigest tool from COMBINE-lab/fastaDigest, which has the function compute_fasta_digest, to make what I thought were the transcriptome hash values that salmon generates when indexing.

I will do some more testing to see in what scenarios the mismatches occur, but I have at least one transcriptome where the fastaDigest output has a different seq_hash value, and at least one scenario where the fastaDigest output has a different name_hash value.

example:

result from compute_fasta_digest:

{
    "NameHash": "496834fa8c72bb07ef7f3595e2932f1b7ec0d3db5923fb77ecfcc4fcf0689d7d",
    "SeqHash": "f2e483a9c7bb5889b2634191c98e0e0603eb69bd37fd285e2895aab74119e8cd"
}

info.json in salmon index dir of same transcriptome fasta:

{
    ...
    "SeqHash": "f2e483a9c7bb5889b2634191c98e0e0603eb69bd37fd285e2895aab74119e8cd",
    "NameHash": "dcab44b2067775a262809cfad786f508703f55270b605f5d5e4caa31f3ece81e",
    "SeqHash512": "eddef08d635dac74fe4628742f2483d004104df52602657a90ff0c736a79c5403dffad55c078338a0052719f461d1efbfb64460368e67ec2d99415db094049ee",
    "NameHash512": "4ffab7ae40100e373e0c5e2ae1f16c04b3cac48d9dcd49023a8715f2ca5043cbe9e545f5fc0374af0e2e99d5fdbd9baaf231eaab7449cc467739e478033c52b7",
    "DecoySeqHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "DecoyNameHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    ...
}
ADD REPLY

Login before adding your answer.

Traffic: 805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6