Question

Create protein sequences including variants from a VCF file

4

Entering edit mode

daniel.magnus.bader ▴ 50

@danielmagnusbader-19953

Last seen 4.8 years ago

Dear all,

I am investigating the proteome of human cancer samples and want to insert their genetic variations into the reference proteome fasta sequences to increase the sensitivity of my peptide/protein quantification.

Can you implement this "proteomeVariantInsertion()" in the VariantAnnotation package?

The VariantAnnotation::predictCoding() function already translates codons at variant positions from a reference BSgenome object to assess the consequences of a variant. I would like to take all coding variants (or just non-synonymous SNVs for a start) and insert them into the reference proteome, then save the modified fasta file.

On customProDB: In principle the package customProDB is already doing this job. But from 11,000 genes with ~40k non-synonymous SNVs that were extracted using VariantAnnotation::predictCoding() only ~2k proteins are changed with at least one variant. There is too much loss. The customProDB package works mostly on custom data.frames and could utilize the maintained Bioc objects on variants and sequences much more.

I would highly appreciate a "Bioconductor-native" solution for the customized proteome challenge.

Thanks, Daniel

VariantAnnotation customProDB • 3.8k views

ADD COMMENT • link updated 7 months ago by Yun • 0 • written 5.8 years ago by daniel.magnus.bader ▴ 50

0

Entering edit mode

This sounds like a feature request - Could you please open it as an issue on the github page for the package: https://github.com/Bioconductor/VariantAnnotation/issues

ADD REPLY • link 5.8 years ago shepherl 4.1k

0

Entering edit mode

This is potentially doable with BSgenome::injectSNPs().

ADD REPLY • link 5.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks for the reply sheperl. I did not want to start right away with an issue, but now I posted it here: https://github.com/Bioconductor/VariantAnnotation/issues/24

Meanwhile, I will look at BSgenome::injectSNPs() which sounds indeed very interesting. Thanks Michael!

ADD REPLY • link 5.8 years ago daniel.magnus.bader ▴ 50

0

Entering edit mode

FWIW you might also want to take a look at Biostrings::replaceAt(). It's lower level and more flexible than BSgenome::injectSNPs() (the former works on AAString/AAStringSet/DNAString/DNAStringSet objects while the later only works on a BSgenome object).

ADD REPLY • link 5.8 years ago Hervé Pagès 16k

0

Entering edit mode

Thanks Herve,

However, I think injectSNPs should be sufficient, since I start from genome coordinates in a VCF.

Best, Daniel

ADD REPLY • link 5.8 years ago daniel.magnus.bader ▴ 50

score 0 · Answer 1 · 2019-03-12

0

Entering edit mode

daniel.magnus.bader ▴ 50

@danielmagnusbader-19953

Last seen 4.8 years ago

Thanks to the suggestions from Michael Lawrence and Herve Pages, I guess it should work as follows:

Identify all coding SNVs, e.g. via VariantAnnotation::predictCoding()
Injecting coding SNVs into the genome, e.g. via BSgenome::injectSNPs()
Concatenate the exons per protein isoform of a gene harboring a coding SNV to gain all relevant coding sequences (already modified)
Translate these into AAString, e.g. via Biostrings::translate()

What is your opinion?

ADD COMMENT • link 5.8 years ago daniel.magnus.bader ▴ 50

0

Entering edit mode

Should work. Use GenomicFeatures::extractTranscriptSeqs() for #3.

ADD REPLY • link 5.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks. Looks like the perfect fit!

ADD REPLY • link 5.8 years ago daniel.magnus.bader ▴ 50

0

Entering edit mode

This cannot work for a SNV file, since injectSNPs need a SNPlocs object

ADD REPLY • link 7 months ago Yun • 0