Is there R software available for doing approximate matching of personal names? I have data about the same people produced by different organizations and the only matching key I have is the name. I know that commercial solutions exist, and I know I code code this from scratch, but I'd prefer to build on some existing free solution if it exists. Unfortunately, the names are not standardized, and there is also a certain level of error: Danny Williams (nickname) Dan Williams (nickname) Daniel Williams (nickname) Dan William (spelling error) D. Williams (initials) Daniel "Danny" Williams (formal + nickname) Dan P. Williams (includes middle initial) Williams, Daniel (different convention) William Daniel (wrong order or missing comma + misspelling) Is there any R software available to find likely matches, ideally with some estimate of accuracy of match? Levenshtein distance as implemented in agrep is a useful solution for some of these cases; I was wondering if there is something that covers more cases. For this particular application, I am not concerned with issues such as variant latinizations/transliterations (e.g. Tsung-Dao Lee ~ T.D. Lee ~ Li Zhengdao; Ghaddafi ~ Qaddhaffi), but of course if someone handles that as well.... Thanks, -s [[alternative HTML version deleted]]