Displaying 1 result from an estimated 1 matches for "nnnnnnnnnnttcgcgcgcagcatgatcctg".
Did you mean:
nnnnnnnnnnnncgcgcgcagcatgatcctg
2010 Jun 24
1
how to group a large list of strings into categories based on string similarity?
...CTCCCGCCGTTCGCGCGCNNNNNNNNNNNN
CAGGATCATGCTGCGCGCGAACGGCGGGAGT
CAGGATCATGCTGCGCGCGAANNNNNNNNNN
CAGGATCATGCTGCGCGCGNNNNNNNNNNNN
......
.....
NNNNNNNCCGTTCGCGCGCAGCATGATCCTG
NNNNNNNNNNNNCGCGCGCAGCATGATCCTG
NNNNNNNNNNNNGCGCGCGAACGGCGGGAGT
NNNNNNNNNNNNNNCGCGCAGCATGATCCTG
NNNNNNNNNNNTGCGCGCGAACGGCGGGAGT
NNNNNNNNNNTTCGCGCGCAGCATGATCCTG
'N' is the missing letter
It can be seen that some strings are the same except for those N's
(i.e. N can match with any base)
given this list of string, I want to have
1) a vector corresponding to each row (string), for each string assign
an id, such that similar strings (those only...