Align a list of DNA sequence, extract only positions with SNPs for conpact view?
1
0
Entering edit mode
Vang Le ▴ 80
@vang-le-6690
Last seen 4.7 years ago
Denmark

I import data from JSON file which contain squence names and DNA sequences, and other things. How do I make multisequence alignment, extracts positions that has SNPs (flanking each end of a SNP with some character, 'N' or '-' for easy differentiation), and visual the alignment in a graph (i.e. ggplot)?

msa Biostrings • 1.6k views
ADD COMMENT
0
Entering edit mode
UBod ▴ 300
@ubodenhofer-5425
Last seen 7 months ago
University of Applied Sciences Upper Au…

As you correctly point out, the pipeline you have in mind consists of three steps:

  • I suggest you try the 'rjson' package to read your JSON file. How to do that exactly depends on the actual format of your JSON file.
  • Suppose you have read the JSON file such that you finally have a named character vector x. You can use the 'msa' package and either run msa(x, type="dna") or something like msa(DNAStringSet(x)). (use the method argument to choose between the three different alignment algorithms; the default is ClustalW)
  • Once you have an alignment, you can start analyzing SNPs. Do I interpret your question correctly that you have previously unknown sequences (e.g. from a previously not sequenced species) and you want to identify novel SNP locations from sequences? If so, I am not aware of any Bioconductor package for doing that, but there might be other tools that are available freely.
ADD COMMENT
0
Entering edit mode

msa (clsutalW) is sluggish. For now I subset alignment from the aligned fasta file which as aligned with mafft outside. Tried several papckages but wrapper for mafft is not straight to use.

You are correct about new SNPs. I

ADD REPLY
0
Entering edit mode

The 'msa' package also supports ClustalOmega and MUSCLE (as said, you can use the method argument to select one of the three algorithms). If you use a different algorithm outside of R/Bioconductor anyway (mafft, as you mentioned), then I suppose that R/Bioconductor would not be of great help to you.

ADD REPLY

Login before adding your answer.

Traffic: 1038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6