thr3ads.net - search: "levenshtein"

Displaying 20 results from an estimated 42 matches for "levenshtein".

2004 Feb 11

AGREP

...ns 1 - I have the version 1.4.1 of R, and it doesn't have the 'agrep' function in the base library. Is there a way to make this funcion avaliable in R 1.4.1? I mean, how to 'copy' it from R 1.8.1 and 'paste' it in R 1.4.1? 2 - The AGREP function doesn't give me the Levenshtein distance (edit distance). Is there a function in R that does it? Is there a way to use AGREP to acomplish this task? I've written such a function, but it is so slow (has so many loops) that it is beeing useless. TIA

Change column/row-name

2011 Oct 21

Change column/row-name

..., 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 2), ncol = 5)) #My Matrix Iske<- Iske+33 #I want see the letters (Iske.char<-apply(Iske, 1, function(x) rawToChar(as.raw(x)))) #Numbers to Char LD <- function(s1, s2){ require(vwr) s1 = as.character(s1) s2 = as.character(s2) t(sapply(s1, levenshtein.distance, s2)) } Iske.levens<-(LD(Iske.char,Iske.char)) #Calculate the Levenshtein-Distanz The result: !"#$% !"#$% !"#$% "!#$% .... !"#$% 0 0 0 !"#$% 0 0 0 !"#$% 0 0 0 . . . It is all beautiful. But is there a simple way to...

String search: Return "closest" match

2008 Aug 26

String search: Return "closest" match

Hi, I have to match names where names can be recorded with errors or additions. Now I am searching for a string search function which returns always the "closest" match. E.g. searching for "Washington" it should return only Washington but not Washington, D.C. But it also could be that the list contains only "Hamburg" but the record I am searching for is

MiscPsycho 1.3 posted to CRAN

2008 Oct 01

MiscPsycho 1.3 posted to CRAN

An updated version of the Miscellaneous Psychometrics package has been updated to CRAN. The following updates are included in the package: 1) An implementation of the Stocking-Lord procedure for linking test scales. 2) An implementation of the Levenshtein algorithm for comparing character strings 3) stringProbs, a function for computing the probability of a given Levenshtein Distance 4) Three sources of documentation on all functions. There is complete technical documentation on the functions used in MiscPsycho in the file MP.pdf, the stocking-lord...

text vector clustering

2009 Jan 22

text vector clustering

Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any

Find by looping thru array

2010 Jan 07

Find by looping thru array

...llowing... - projects_controller - def search # get list of search parameters from id sent. list = params[:id] # split by comma list = list.split('','') # assign each id_search_string = list[0] title_search_string = list[1] # create new instance of Levenshtein using ''amatch'' m = Levenshtein.new(title_search_string) # retrieve all titles from projects @title = Project.find(:all, :select => ''DISTINCT id, title'') # create projects array projects = Array.new # loop thru all titles @title.each...

fuzzy search

2006 Jun 19

fuzzy search

...ails, but what are people doing to find records based on fuzzy string matches? For example, if you wanted to find a Person with name "David Heinemeier Hansson" but searched using the string "Dave Hansson". Currently I am find_by_sql that calls the PostgreSQL function "levenshtein(string1, string2)" which returns results with a score indicating how close the matches are. It is OK, but nowhere good as I would hope. Any better suggestions? thanks, Jeff -- Posted via http://www.ruby-forum.com/.

Pattern names matching

2011 Aug 20

Pattern names matching

Dear R magic guys.. I have two tables (actually will be dataframes), both with names to be matched. The names on the first dataframe are from a study with antenatal visits on some health centers here. It happens that we need the delivery info. And half and some thing else of the women decided to delivery some where else our health units. We managed to get the names from some other places but now

Object problems with Generic and rematchDefinition

2007 Apr 02

Object problems with Generic and rematchDefinition

...Speaks<-as.vector(object at env$MSMSmz[1:Pcount[i]]) rtMSpeaks<-as.vector(object at env$MSMSrt[i]) preMZ<-as.vector(object at env$MSMSpremz[i]) preZ<-as.vector(object at env$MSMSpreZ[i]) } decide<-decideBest(met.xml, MSMSpeaks, cost, ppm) if(decide$levenshtein<5){ print(preMZ) temp.values<-c(met.xml[decide$indexNum,1], decide$levenshtein, preMZ, preZ,rtMSpeaks, paste(MSMSpeaks, collapse=":")) values<-rbind(values,temp.values) } #else cat(".") ? } return(values) write.csv(va...

Similarity matching with probabilities

2008 Jun 27

Similarity matching with probabilities

Hello, It's just a strange coincidence that someone posted just very recently a question about matching. I know there are several match function in the base package (such as match, pmatch, charmatch, and the gsub etc) but I can't seem to use them wisely to be able to get what I need. suppose I have the following strings: "tets" "estt" "rtes7"

Bug in agrep computing edit distance?

2010 Nov 16

Bug in agrep computing edit distance?

The documentation for agrep says it uses the Levenshtein edit distance, but it seems to get this wrong in certain cases when there is a combination of deletions and substitutions. For example: > agrep("abcd", "abcxyz", max.distance=1) [1] 1 That should've been a no-match. The edit distance between those strings is 3 (1 sub...

GSoC 2012: Learning To Rank

2012 Mar 18

GSoC 2012: Learning To Rank

...er reimplementation", "Dynamic Snippets", "Gmane Search improvements" or even "Replace socket code with ZeroMQ" project (I have an experience of developing distributed systems using HornetQ messaging system). Additionally, currently I'm doing some research on Levenshtein automata and it shows competitive results, so I can implement it as one of spelling correction algorithms as a bonus. As you know, I'm familiar enough with Xapian code base and have good skills so you could offer me any other project if you think it has higher priority. I suggest we discuss i...

[LLVMdev] Spell Correction Efficiency

2011 Jan 15

[LLVMdev] Spell Correction Efficiency

Hello Doug, *putting llvmdev in copy since they are concerned too* I've finally got around to finish a working implementation of the typical Levenshtein Distance with the diagonal optimization. I've tested it against the original llvm implementation and checked it on a set of ~18k by randomly generating a variation of each word and checking that both implementations would return the same result... took me some time to patch the diagonal versio...

strange auth issue

2017 Feb 20

strange auth issue

Windows make any "Levenshtein distance" into domains and fix them??? ummm.... ----- Mensaje original ----- De: "Ing. Luis Felipe Domínguez Vega" <luis.dominguez at mtz.desoft.cu> Para: "Sonic" <sonicsmith at gmail.com> CC: "samba" <samba at lists.samba.org> Enviados: Lunes,...

bug en funcion 'agrep'

2012 Jan 19

bug en funcion 'agrep'

Estimados R-users: Estoy intentando usar la función 'agrep' para hacer búsquedas en cadenas de texto. El parámetro max.distance permite controlar la medida de aproximación de búsqueda de la función de Levenshtein. No obstante, cuando hago búsquedas específicas no obtengo siempre el resultado deseado y no se si es un bug o que no entiendo bien el algoritmo de búsqueda. Por ejemplo: > agrep("Acacia m1", "Acacia macradenia", value=T, max.distance=list(all=1)) [1] "Acacia macr...

[LLVMdev] Spell Correction Efficiency

2011 Feb 03

[LLVMdev] Spell Correction Efficiency

On Jan 15, 2011, at 8:31 AM, Matthieu Monrocq wrote: > Hello Doug, > > *putting llvmdev in copy since they are concerned too* > > I've finally got around to finish a working implementation of the typical Levenshtein Distance with the diagonal optimization. > > I've tested it against the original llvm implementation and checked it on a set of ~18k by randomly generating a variation of each word and checking that both implementations would return the same result... took me some time to patch the diago...

Bug in agrep computing edit distance?

2010 Nov 17

Bug in agrep computing edit distance?

I posted this yesterday to r-help and Ben Bolker suggested reposting it here... Dickison, Daniel <ddickison <at> carnegielearning.com> writes: > > The documentation for agrep says it uses the Levenshtein edit distance, > but it seems to get this wrong in certain cases when there is a > combination of deletions and substitutions. For example: > > > agrep("abcd", "abcxyz", max.distance=1) > [1] 1 > > That should've been a no-match. The edit distance b...

procesamiento de textos con R

2015 Jul 07

procesamiento de textos con R

Buenos días, quisiera saber si existe algún paquete en R para procesamiento de texto, búsqueda de similitudes y ese tipo de cosas. He estado buscando pero no he encontrado nada al respecto. Gracias Un saludo [[alternative HTML version deleted]]

Metaphone analysis

2006 Nov 25

Metaphone analysis

...but inadvertently type ''qwick'' instead of ''quick'' it would still match because ''qwick'' metaphoned also becomes ''KK''. Still a lot to do, such as test it with AAF, and see how it interacts with using slop (which measures the Levenshtein distance, http://en.wikipedia.org/wiki/Levenshtein, between two terms) so that I can put in a "Did you mean xxx" feature (where xxx is a list of terms within a certain distance of the original query). Plus many other ideas also, such as thesaurus searching. Hopefully this has been in...

typosquatting and trojan horses in packages

2016 Jun 10

typosquatting and trojan horses in packages

...e that the software that unpacks and installs a third party package (pip or npm) does not allow the execution of code that originates from the package itself. Only when the user explicitly loads the package, the library code should be executed. Generate a List of Potential Typo Candidates Generate Levenshtein distance candidates for the most downloaded N packages of the repository and alarm administrators on registration of such a candidate. Analyze 404 logfiles and prevent registration of often shadow installed packages Whenever a user makes a typo by installing a package and the package is not regist...

search for: levenshtein