search for: levenshtein

Displaying 20 results from an estimated 42 matches for "levenshtein".

2004 Feb 11
6
AGREP
...ns 1 - I have the version 1.4.1 of R, and it doesn't have the 'agrep' function in the base library. Is there a way to make this funcion avaliable in R 1.4.1? I mean, how to 'copy' it from R 1.8.1 and 'paste' it in R 1.4.1? 2 - The AGREP function doesn't give me the Levenshtein distance (edit distance). Is there a function in R that does it? Is there a way to use AGREP to acomplish this task? I've written such a function, but it is so slow (has so many loops) that it is beeing useless. TIA
2011 Oct 21
2
Change column/row-name
..., 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 2), ncol = 5)) #My Matrix Iske<- Iske+33 #I want see the letters (Iske.char<-apply(Iske, 1, function(x) rawToChar(as.raw(x)))) #Numbers to Char LD <- function(s1, s2){ require(vwr) s1 = as.character(s1) s2 = as.character(s2) t(sapply(s1, levenshtein.distance, s2)) } Iske.levens<-(LD(Iske.char,Iske.char)) #Calculate the Levenshtein-Distanz The result: !"#$% !"#$% !"#$% "!#$% .... !"#$% 0 0 0 !"#$% 0 0 0 !"#$% 0 0 0 . . . It is all beautiful. But is there a simple way to...
2008 Aug 26
2
String search: Return "closest" match
Hi, I have to match names where names can be recorded with errors or additions. Now I am searching for a string search function which returns always the "closest" match. E.g. searching for "Washington" it should return only Washington but not Washington, D.C. But it also could be that the list contains only "Hamburg" but the record I am searching for is
2008 Oct 01
0
MiscPsycho 1.3 posted to CRAN
An updated version of the Miscellaneous Psychometrics package has been updated to CRAN. The following updates are included in the package: 1) An implementation of the Stocking-Lord procedure for linking test scales. 2) An implementation of the Levenshtein algorithm for comparing character strings 3) stringProbs, a function for computing the probability of a given Levenshtein Distance 4) Three sources of documentation on all functions. There is complete technical documentation on the functions used in MiscPsycho in the file MP.pdf, the stocking-lord...
2009 Jan 22
4
text vector clustering
Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any
2010 Jan 07
2
Find by looping thru array
...llowing... - projects_controller - def search # get list of search parameters from id sent. list = params[:id] # split by comma list = list.split('','') # assign each id_search_string = list[0] title_search_string = list[1] # create new instance of Levenshtein using ''amatch'' m = Levenshtein.new(title_search_string) # retrieve all titles from projects @title = Project.find(:all, :select => ''DISTINCT id, title'') # create projects array projects = Array.new # loop thru all titles @title.each...
2006 Jun 19
2
fuzzy search
...ails, but what are people doing to find records based on fuzzy string matches? For example, if you wanted to find a Person with name "David Heinemeier Hansson" but searched using the string "Dave Hansson". Currently I am find_by_sql that calls the PostgreSQL function "levenshtein(string1, string2)" which returns results with a score indicating how close the matches are. It is OK, but nowhere good as I would hope. Any better suggestions? thanks, Jeff -- Posted via http://www.ruby-forum.com/.
2011 Aug 20
2
Pattern names matching
Dear R magic guys.. I have two tables (actually will be dataframes), both with names to be matched. The names on the first dataframe are from a study with antenatal visits on some health centers here. It happens that we need the delivery info. And half and some thing else of the women decided to delivery some where else our health units. We managed to get the names from some other places but now
2007 Apr 02
0
Object problems with Generic and rematchDefinition
...Speaks<-as.vector(object at env$MSMSmz[1:Pcount[i]]) rtMSpeaks<-as.vector(object at env$MSMSrt[i]) preMZ<-as.vector(object at env$MSMSpremz[i]) preZ<-as.vector(object at env$MSMSpreZ[i]) } decide<-decideBest(met.xml, MSMSpeaks, cost, ppm) if(decide$levenshtein<5){ print(preMZ) temp.values<-c(met.xml[decide$indexNum,1], decide$levenshtein, preMZ, preZ,rtMSpeaks, paste(MSMSpeaks, collapse=":")) values<-rbind(values,temp.values) } #else cat(".") ? } return(values) write.csv(va...
2008 Jun 27
1
Similarity matching with probabilities
Hello, It's just a strange coincidence that someone posted just very recently a question about matching. I know there are several match function in the base package (such as match, pmatch, charmatch, and the gsub etc) but I can't seem to use them wisely to be able to get what I need. suppose I have the following strings: "tets" "estt" "rtes7"
2010 Nov 16
1
Bug in agrep computing edit distance?
The documentation for agrep says it uses the Levenshtein edit distance, but it seems to get this wrong in certain cases when there is a combination of deletions and substitutions. For example: > agrep("abcd", "abcxyz", max.distance=1) [1] 1 That should've been a no-match. The edit distance between those strings is 3 (1 sub...
2012 Mar 18
1
GSoC 2012: Learning To Rank
...er reimplementation", "Dynamic Snippets", "Gmane Search improvements" or even "Replace socket code with ZeroMQ" project (I have an experience of developing distributed systems using HornetQ messaging system). Additionally, currently I'm doing some research on Levenshtein automata and it shows competitive results, so I can implement it as one of spelling correction algorithms as a bonus. As you know, I'm familiar enough with Xapian code base and have good skills so you could offer me any other project if you think it has higher priority. I suggest we discuss i...
2011 Jan 15
2
[LLVMdev] Spell Correction Efficiency
Hello Doug, *putting llvmdev in copy since they are concerned too* I've finally got around to finish a working implementation of the typical Levenshtein Distance with the diagonal optimization. I've tested it against the original llvm implementation and checked it on a set of ~18k by randomly generating a variation of each word and checking that both implementations would return the same result... took me some time to patch the diagonal versio...
2017 Feb 20
1
strange auth issue
Windows make any "Levenshtein distance" into domains and fix them??? ummm.... ----- Mensaje original ----- De: "Ing. Luis Felipe Domínguez Vega" <luis.dominguez at mtz.desoft.cu> Para: "Sonic" <sonicsmith at gmail.com> CC: "samba" <samba at lists.samba.org> Enviados: Lunes,...
2012 Jan 19
1
bug en funcion 'agrep'
Estimados R-users: Estoy intentando usar la función 'agrep' para hacer búsquedas en cadenas de texto. El parámetro max.distance permite controlar la medida de aproximación de búsqueda de la función de Levenshtein. No obstante, cuando hago búsquedas específicas no obtengo siempre el resultado deseado y no se si es un bug o que no entiendo bien el algoritmo de búsqueda. Por ejemplo: > agrep("Acacia m1", "Acacia macradenia", value=T, max.distance=list(all=1)) [1] "Acacia macr...
2011 Feb 03
0
[LLVMdev] Spell Correction Efficiency
On Jan 15, 2011, at 8:31 AM, Matthieu Monrocq wrote: > Hello Doug, > > *putting llvmdev in copy since they are concerned too* > > I've finally got around to finish a working implementation of the typical Levenshtein Distance with the diagonal optimization. > > I've tested it against the original llvm implementation and checked it on a set of ~18k by randomly generating a variation of each word and checking that both implementations would return the same result... took me some time to patch the diago...
2010 Nov 17
2
Bug in agrep computing edit distance?
I posted this yesterday to r-help and Ben Bolker suggested reposting it here... Dickison, Daniel <ddickison <at> carnegielearning.com> writes: > > The documentation for agrep says it uses the Levenshtein edit distance, > but it seems to get this wrong in certain cases when there is a > combination of deletions and substitutions. For example: > > > agrep("abcd", "abcxyz", max.distance=1) > [1] 1 > > That should've been a no-match. The edit distance b...
2015 Jul 07
4
procesamiento de textos con R
Buenos días, quisiera saber si existe algún paquete en R para procesamiento de texto, búsqueda de similitudes y ese tipo de cosas. He estado buscando pero no he encontrado nada al respecto. Gracias Un saludo [[alternative HTML version deleted]]
2006 Nov 25
5
Metaphone analysis
...but inadvertently type ''qwick'' instead of ''quick'' it would still match because ''qwick'' metaphoned also becomes ''KK''. Still a lot to do, such as test it with AAF, and see how it interacts with using slop (which measures the Levenshtein distance, http://en.wikipedia.org/wiki/Levenshtein, between two terms) so that I can put in a "Did you mean xxx" feature (where xxx is a list of terms within a certain distance of the original query). Plus many other ideas also, such as thesaurus searching. Hopefully this has been in...
2016 Jun 10
0
typosquatting and trojan horses in packages
...e that the software that unpacks and installs a third party package (pip or npm) does not allow the execution of code that originates from the package itself. Only when the user explicitly loads the package, the library code should be executed. Generate a List of Potential Typo Candidates Generate Levenshtein distance candidates for the most downloaded N packages of the repository and alarm administrators on registration of such a candidate. Analyze 404 logfiles and prevent registration of often shadow installed packages Whenever a user makes a typo by installing a package and the package is not regist...