Entering edit mode
Xiaohui Wu
▴
280
@xiaohui-wu-4141
Last seen 10.2 years ago
Hi all,
I have millions like 100M DNA reads each of which is ~150nt, some of
them are duplicate. Is there any way to group the same sequences into
one and count the number, like unique() function in R, but with the
occurrence of read and also more efficient?
Also, if I want to cluster these 100M reads based on their
similarity, like editor distance or some distance <=2, is there some
function or package can be used?
Thank you!
Xiaohui