Displaying 20 results from an estimated 42 matches for "levenshtein".
2004 Feb 11
6
AGREP
...ns
1 - I have the version 1.4.1 of R, and it doesn't have the 'agrep'
function in the base library. Is there a way to make this funcion
avaliable in R 1.4.1? I mean, how to 'copy' it from R 1.8.1 and 'paste'
it in R 1.4.1?
2 - The AGREP function doesn't give me the Levenshtein distance (edit
distance). Is there a function in R that does it? Is there a way to use
AGREP to acomplish this task? I've written such a function, but it is so
slow (has so many loops) that it is beeing useless.
TIA
2011 Oct 21
2
Change column/row-name
..., 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 2), ncol = 5)) #My Matrix
Iske<- Iske+33 #I want see the letters
(Iske.char<-apply(Iske, 1, function(x) rawToChar(as.raw(x)))) #Numbers to Char
LD <- function(s1, s2){
require(vwr)
s1 = as.character(s1)
s2 = as.character(s2)
t(sapply(s1, levenshtein.distance, s2))
}
Iske.levens<-(LD(Iske.char,Iske.char)) #Calculate the Levenshtein-Distanz
The result:
!"#$% !"#$% !"#$% "!#$% ....
!"#$% 0 0 0
!"#$% 0 0 0
!"#$% 0 0 0
.
.
.
It is all beautiful. But is there a simple way to...
2008 Aug 26
2
String search: Return "closest" match
Hi,
I have to match names where names can be recorded with errors or additions.
Now I am searching for a string search function which returns always the "closest" match. E.g. searching for "Washington" it should return only Washington but not Washington, D.C. But it also could be that the list contains only "Hamburg" but the record I am searching for is
2008 Oct 01
0
MiscPsycho 1.3 posted to CRAN
An updated version of the Miscellaneous Psychometrics package has been
updated to CRAN. The following updates are included in the package:
1) An implementation of the Stocking-Lord procedure for linking test
scales.
2) An implementation of the Levenshtein algorithm for comparing
character strings
3) stringProbs, a function for computing the probability of a given
Levenshtein Distance
4) Three sources of documentation on all functions. There is complete
technical documentation on the functions used in MiscPsycho in the file
MP.pdf, the stocking-lord...
2009 Jan 22
4
text vector clustering
Hi,
I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with
single column which contain the 30,000 students names. There were typo
errors while entering this student names. The actual list of names is <
1000. However we dont have that list for keyword search.
I am interested in grouping/cluster these names as those which are
similar letter to letter. Are there any
2010 Jan 07
2
Find by looping thru array
...llowing...
- projects_controller -
def search
# get list of search parameters from id sent.
list = params[:id]
# split by comma
list = list.split('','')
# assign each
id_search_string = list[0]
title_search_string = list[1]
# create new instance of Levenshtein using ''amatch''
m = Levenshtein.new(title_search_string)
# retrieve all titles from projects
@title = Project.find(:all, :select => ''DISTINCT id, title'')
# create projects array
projects = Array.new
# loop thru all titles
@title.each...
2006 Jun 19
2
fuzzy search
...ails, but what are people doing to find records
based on fuzzy string matches? For example, if you wanted to find a
Person with name "David Heinemeier Hansson" but searched using the
string "Dave Hansson".
Currently I am find_by_sql that calls the PostgreSQL function
"levenshtein(string1, string2)" which returns results with a score
indicating how close the matches are. It is OK, but nowhere good as I
would hope. Any better suggestions?
thanks,
Jeff
--
Posted via http://www.ruby-forum.com/.
2011 Aug 20
2
Pattern names matching
Dear R magic guys.. I have two tables (actually will be dataframes), both
with names to be matched.
The names on the first dataframe are from a study with antenatal visits on
some health centers here. It happens that we need the delivery info. And
half and some thing else of the women decided to delivery some where else
our health units. We managed to get the names from some other places but now
2007 Apr 02
0
Object problems with Generic and rematchDefinition
...Speaks<-as.vector(object at env$MSMSmz[1:Pcount[i]])
rtMSpeaks<-as.vector(object at env$MSMSrt[i])
preMZ<-as.vector(object at env$MSMSpremz[i])
preZ<-as.vector(object at env$MSMSpreZ[i])
}
decide<-decideBest(met.xml, MSMSpeaks, cost, ppm)
if(decide$levenshtein<5){
print(preMZ)
temp.values<-c(met.xml[decide$indexNum,1], decide$levenshtein,
preMZ, preZ,rtMSpeaks,
paste(MSMSpeaks, collapse=":"))
values<-rbind(values,temp.values)
} #else cat(".") ?
}
return(values)
write.csv(va...
2008 Jun 27
1
Similarity matching with probabilities
Hello,
It's just a strange coincidence that someone posted just very recently a
question about matching. I know there are several match function in the base
package (such as match, pmatch, charmatch, and the gsub etc) but I can't
seem to use them wisely to be able to get what I need.
suppose I have the following strings:
"tets"
"estt"
"rtes7"
2010 Nov 16
1
Bug in agrep computing edit distance?
The documentation for agrep says it uses the Levenshtein edit distance,
but it seems to get this wrong in certain cases when there is a
combination of deletions and substitutions. For example:
> agrep("abcd", "abcxyz", max.distance=1)
[1] 1
That should've been a no-match. The edit distance between those strings
is 3 (1 sub...
2012 Mar 18
1
GSoC 2012: Learning To Rank
...er reimplementation", "Dynamic
Snippets", "Gmane Search improvements" or even "Replace socket code
with ZeroMQ" project (I have an experience of developing distributed
systems using HornetQ messaging system).
Additionally, currently I'm doing some research on Levenshtein
automata and it shows competitive results, so I can implement it as
one of spelling correction algorithms as a bonus.
As you know, I'm familiar enough with Xapian code base and have good
skills so you could offer me any other project if you think it has
higher priority.
I suggest we discuss i...
2011 Jan 15
2
[LLVMdev] Spell Correction Efficiency
Hello Doug,
*putting llvmdev in copy since they are concerned too*
I've finally got around to finish a working implementation of the typical
Levenshtein Distance with the diagonal optimization.
I've tested it against the original llvm implementation and checked it on a
set of ~18k by randomly generating a variation of each word and checking
that both implementations would return the same result... took me some time
to patch the diagonal versio...
2017 Feb 20
1
strange auth issue
Windows make any "Levenshtein distance" into domains and fix them??? ummm....
----- Mensaje original -----
De: "Ing. Luis Felipe Domínguez Vega" <luis.dominguez at mtz.desoft.cu>
Para: "Sonic" <sonicsmith at gmail.com>
CC: "samba" <samba at lists.samba.org>
Enviados: Lunes,...
2012 Jan 19
1
bug en funcion 'agrep'
Estimados R-users:
Estoy intentando usar la función 'agrep' para hacer búsquedas en
cadenas de texto. El parámetro max.distance permite controlar la
medida de aproximación de búsqueda de la función de Levenshtein. No
obstante, cuando hago búsquedas específicas no obtengo siempre el
resultado deseado y no se si es un bug o que no entiendo bien el
algoritmo de búsqueda. Por ejemplo:
> agrep("Acacia m1", "Acacia macradenia", value=T, max.distance=list(all=1))
[1] "Acacia macr...
2011 Feb 03
0
[LLVMdev] Spell Correction Efficiency
On Jan 15, 2011, at 8:31 AM, Matthieu Monrocq wrote:
> Hello Doug,
>
> *putting llvmdev in copy since they are concerned too*
>
> I've finally got around to finish a working implementation of the typical Levenshtein Distance with the diagonal optimization.
>
> I've tested it against the original llvm implementation and checked it on a set of ~18k by randomly generating a variation of each word and checking that both implementations would return the same result... took me some time to patch the diago...
2010 Nov 17
2
Bug in agrep computing edit distance?
I posted this yesterday to r-help and Ben Bolker suggested reposting it
here...
Dickison, Daniel <ddickison <at> carnegielearning.com> writes:
>
> The documentation for agrep says it uses the Levenshtein edit distance,
> but it seems to get this wrong in certain cases when there is a
> combination of deletions and substitutions. For example:
>
> > agrep("abcd", "abcxyz", max.distance=1)
> [1] 1
>
> That should've been a no-match. The edit distance b...
2015 Jul 07
4
procesamiento de textos con R
Buenos días,
quisiera saber si existe algún paquete en R para procesamiento de texto,
búsqueda de similitudes y ese tipo de cosas. He estado buscando pero no he
encontrado nada al respecto.
Gracias
Un saludo
[[alternative HTML version deleted]]
2006 Nov 25
5
Metaphone analysis
...but
inadvertently type ''qwick'' instead of ''quick'' it would still match because
''qwick'' metaphoned also becomes ''KK''.
Still a lot to do, such as test it with AAF, and see how it interacts with
using slop (which measures the Levenshtein distance,
http://en.wikipedia.org/wiki/Levenshtein, between two terms) so that I can
put in a "Did you mean xxx" feature (where xxx is a list of terms within a
certain distance of the original query). Plus many other ideas also, such as
thesaurus searching.
Hopefully this has been in...
2016 Jun 10
0
typosquatting and trojan horses in packages
...e that the software that unpacks and installs a third party package
(pip or npm) does not allow the execution of code that originates from
the package itself. Only when the user explicitly loads the package, the
library code should be executed.
Generate a List of Potential Typo Candidates Generate Levenshtein
distance candidates for the most downloaded N packages of the repository
and alarm administrators on registration of such a candidate.
Analyze 404 logfiles and prevent registration of often shadow installed
packages Whenever a user makes a typo by installing a package and the
package is not regist...