Displaying 1 result from an estimated 1 matches for "caggatcatgctgcgcgcgaacggcgggagt".
2010 Jun 24
1
how to group a large list of strings into categories based on string similarity?
Hi,
I want to group a large list (20 million) of strings into categories
based on string similarity?
The specific problem is: given a list of DNA sequence as below
ACTCCCGCCGTTCGCGCGCAGCATGATCCTG
ACTCCCGCCGTTCGCGCGCNNNNNNNNNNNN
CAGGATCATGCTGCGCGCGAACGGCGGGAGT
CAGGATCATGCTGCGCGCGAANNNNNNNNNN
CAGGATCATGCTGCGCGCGNNNNNNNNNNNN
......
.....
NNNNNNNCCGTTCGCGCGCAGCATGATCCTG
NNNNNNNNNNNNCGCGCGCAGCATGATCCTG
NNNNNNNNNNNNGCGCGCGAACGGCGGGAGT
NNNNNNNNNNNNNNCGCGCAGCATGATCCTG
NNNNNNNNNNNTGCGCGCGAACGGCGGGAGT
NNNNNNNNNNTTCGCGCGCAGCATGATCCTG
'N' is the missing le...