Dear R magic guys.. I have two tables (actually will be dataframes), both with names to be matched. The names on the first dataframe are from a study with antenatal visits on some health centers here. It happens that we need the delivery info. And half and some thing else of the women decided to delivery some where else our health units. We managed to get the names from some other places but now we have to match our 4000 original names with over 20000 other names. To make thing more bitter some names have badly written. So I need some algorithm like Levenstein or sondex or phonix or something better already on R. Can you help me? Orvalho [[alternative HTML version deleted]]
See the stringMatch function in the MiscPsycho package for an implementation of Levenshtein ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Orvalho Augusto [orvaquim at gmail.com] Sent: Saturday, August 20, 2011 11:08 AM To: r-help at r-project.org Subject: [R] Pattern names matching Dear R magic guys.. I have two tables (actually will be dataframes), both with names to be matched. The names on the first dataframe are from a study with antenatal visits on some health centers here. It happens that we need the delivery info. And half and some thing else of the women decided to delivery some where else our health units. We managed to get the names from some other places but now we have to match our 4000 original names with over 20000 other names. To make thing more bitter some names have badly written. So I need some algorithm like Levenstein or sondex or phonix or something better already on R. Can you help me? Orvalho [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Aug 20, 2011, at 11:25 AM, Doran, Harold wrote:> See the stringMatch function in the MiscPsycho package for an > implementation of LevenshteinThe agrep function in base R also returns a Levenshtein distance. -- David.> ________________________________________ > From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On > Behalf Of Orvalho Augusto [orvaquim at gmail.com] > Sent: Saturday, August 20, 2011 11:08 AM > To: r-help at r-project.org > Subject: [R] Pattern names matching > > Dear R magic guys.. I have two tables (actually will be dataframes), > both > with names to be matched. > > The names on the first dataframe are from a study with antenatal > visits on > some health centers here. It happens that we need the delivery info. > And > half and some thing else of the women decided to delivery some where > else > our health units. We managed to get the names from some other places > but now > we have to match our 4000 original names with over 20000 other names. > > To make thing more bitter some names have badly written. So I need > some > algorithm like Levenstein or sondex or phonix or something better > already on > R. Can you help me? > > OrvalhoDavid Winsemius, MD West Hartford, CT