thr3ads.net - R help - [R] Approximate name matching [May 2011]

If this information is useful, please help other people find it:
Share via:

Stavros Macrakis

2011-May-09 18:30 UTC

[R] Approximate name matching

Is there R software available for doing approximate matching of personal
names?

I have data about the same people produced by different organizations and
the only matching key I have is the name. I know that commercial solutions
exist, and I know I code code this from scratch, but I'd prefer to build on
some existing free solution if it exists.

Unfortunately, the names are not standardized, and there is also a certain
level of error:

       Danny Williams (nickname)
       Dan Williams (nickname)
       Daniel Williams (nickname)
       Dan William (spelling error)
       D. Williams (initials)
       Daniel "Danny" Williams (formal + nickname)
       Dan P. Williams (includes middle initial)
       Williams, Daniel (different convention)
       William Daniel (wrong order or missing comma + misspelling)

Is there any R software available to find likely matches, ideally with some
estimate of accuracy of match?  Levenshtein distance as implemented in agrep
is a useful solution for some of these cases; I was wondering if there is
something that covers more cases.

For this particular application, I am not concerned with issues such as
variant latinizations/transliterations (e.g. Tsung-Dao Lee ~ T.D. Lee ~ Li
Zhengdao; Ghaddafi ~ Qaddhaffi), but of course if someone handles that as
well....

Thanks,

            -s

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more seemingly similar threads

R help - May 2011 - Approximate name matching

[R] Approximate name matching

Maybe Matching Threads

Wisdom of the Ancients