I have a column of words, for example "DOG" "DOOG" "GOD" "GOOD" "DOOR" ... and I am interested in creating a matrix that contains the string edit distances between each pair of words. I am this close -> ' ' <- to writing the algorithm myself (which will allow for different variations on the string edit rules, indels, plus or minus transpositions, and possibly some variations on that), but I figured I'd see if anyone on the list has any experience with this and might already have some shoulders for me to stand on. Thanks, Thomas Thomas Hills Ph.D. Department of Psychological and Brain Sciences Indiana University Bloomington, IN 47405 [[alternative HTML version deleted]]
Does the code underlying agrep() do what you want? On Sat, 7 Apr 2007, Thomas Hills wrote:> I have a column of words, for example > > "DOG" > "DOOG" > "GOD" > "GOOD" > "DOOR" > ... > > and I am interested in creating a matrix that contains the string > edit distances between each pair of words. I am this close -> ' ' > <- to writing the algorithm myself (which will allow for different > variations on the string edit rules, indels, plus or minus > transpositions, and possibly some variations on that), but I figured > I'd see if anyone on the list has any experience with this and might > already have some shoulders for me to stand on. > > Thanks, > > Thomas > > > Thomas Hills Ph.D. > Department of Psychological and Brain Sciences > Indiana University > Bloomington, IN 47405 > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thomas Hills wrote:> I have a column of words, for example > > "DOG" > "DOOG" > "GOD" > "GOOD" > "DOOR" > ... > > and I am interested in creating a matrix that contains the string > edit distances between each pair of words. I am this close -> ' ' > <- to writing the algorithm myself (which will allow for different > variations on the string edit rules, indels, plus or minus > transpositions, and possibly some variations on that), but I figured > I'd see if anyone on the list has any experience with this and might > already have some shoulders for me to stand on. >See http://wiki.r-project.org/rwiki/doku.php?id=tips:data-strings:levenshtein for some R code which might be useful. HTH, Tobias -- Tobias Verbeke - Consultant Business & Decision Benelux Rue de la r?volution 8 1000 Brussels - BELGIUM +32 499 36 33 15 tobias.verbeke at businessdecision.com
It's in package cba (sdists()). David ------------------ I have a column of words, for example "DOG" "DOOG" "GOD" "GOOD" "DOOR" ... and I am interested in creating a matrix that contains the string edit distances between each pair of words. I am this close -> ' ' <- to writing the algorithm myself (which will allow for different variations on the string edit rules, indels, plus or minus transpositions, and possibly some variations on that), but I figured I'd see if anyone on the list has any experience with this and might already have some shoulders for me to stand on.