Clustering sequence on similarity using percentage identity matrix
0
0
Entering edit mode
Bade ▴ 310
@bade-5877
Last seen 4.0 years ago
Delaware

I have a set of 400 nucleotide sequences and want to cluster them on basis of similarity. For clustering, I am expecting a similarity <=45% among members of a cluster. Also, there will be a few sequences that do not show similarity to any other member. Is there any clustering approach that allow us to set a cut-off for relation (similarity) between members? and can keep the members with very low similarity to a "unclustered" set?

I have generated the percentage identity matrix (400 x 400) using clustal-omega, and using this matrix for clustering by "affinity-propagation" approach but not getting good results.

p.s. I have had used "cd-hit" and "uclust" already but they are not recommended for cases when expected sequence similarity is below 70%.

Link to my question on BioStar - https://www.biostars.org/p/147913/

Bade

alignment clustering • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6