thr3ads.net - R help - [R] library/function to compare two phrases? [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Brian Feeny

2012-Nov-17 23:00 UTC

[R] library/function to compare two phrases?

I am looking for a library/function in R that can compare two phrases and give
me a score, or somehow classify them as correct as possible.

The "phrases" are obfuscated/messy.  I am not concerned about which is
"correct" (for example spell checking), I am only concerned in
grouping them
so that I know they are the closest match.

Example:

I have ROW1 and ROW2 like so:

ROW1							ROW2
hamburger helper				bigmc heartkcatta
chicken nuggets					chicke, nuggets, jss
bigmac heartattack				some sombody somehwere
somebody somehwere			repleh regrubmah

I am looking for something that can tell me that the best match for hamburger
helper is repleh regrubmah, and the same for each other row.

So my goal is to write a program that foreach phrase in ROW1 runs this function
against ROW2 and gives me the phrase that scored best.

I have read over much of the NLP packages at
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

I thought lsa might be a good fit, but I am not sure.  I have limited time, so I
am hoping someone can point me in a direction of what I am looking for.

I have been searching for "text classifiers", perhaps this problem is
referred to as something else.

Brian

R. Michael Weylandt

2012-Nov-17 23:20 UTC

head link

[R] library/function to compare two phrases?

On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com>
wrote:> I am looking for a library/function in R that can compare two phrases and
give me a score, or somehow classify them as correct as possible.
>
> The "phrases" are obfuscated/messy.  I am not concerned about
which is "correct" (for example spell checking), I am only concerned
in grouping them
> so that I know they are the closest match.
>
> Example:
>
> I have ROW1 and ROW2 like so:
>
> ROW1                                                    ROW2
> hamburger helper                                bigmc heartkcatta
> chicken nuggets                                 chicke, nuggets, jss
> bigmac heartattack                              some sombody somehwere
> somebody somehwere                      repleh regrubmah
>
> I am looking for something that can tell me that the best match for
hamburger helper is repleh regrubmah, and the same for each other row.
>
> So my goal is to write a program that foreach phrase in ROW1 runs this
function against ROW2 and gives me the phrase that scored best.
>
> I have read over much of the NLP packages at
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> I thought lsa might be a good fit, but I am not sure.  I have limited time,
so I am hoping someone can point me in a direction of what I am looking for.
>
> I have been searching for "text classifiers", perhaps this
problem is referred to as something else.
>
This is outside my expertise, but if memory serves, you might benefit
from googling the Levenshtein (spelling?) distance which allows this
sort of fuzzy matching of strings.

MW

Brian Feeny

2012-Nov-18 01:29 UTC

head link

[R] library/function to compare two phrases?

Thank you Michael and David.  I am onto agrep and adist and they look very
useful for what I am wanting to do.  My initial results are promising!

Brian

On Nov 17, 2012, at 6:20 PM, R. Michael Weylandt wrote:
> On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com>
wrote:
>> I am looking for a library/function in R that can compare two phrases
and give me a score, or somehow classify them as correct as possible.
>> 
>> The "phrases" are obfuscated/messy.  I am not concerned about
which is "correct" (for example spell checking), I am only concerned
in grouping them
>> so that I know they are the closest match.
>> 
>> Example:
>> 
>> I have ROW1 and ROW2 like so:
>> 
>> ROW1                                                    ROW2
>> hamburger helper                                bigmc heartkcatta
>> chicken nuggets                                 chicke, nuggets, jss
>> bigmac heartattack                              some sombody somehwere
>> somebody somehwere                      repleh regrubmah
>> 
>> I am looking for something that can tell me that the best match for
hamburger helper is repleh regrubmah, and the same for each other row.
>> 
>> So my goal is to write a program that foreach phrase in ROW1 runs this
function against ROW2 and gives me the phrase that scored best.
>> 
>> I have read over much of the NLP packages at
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>> 
>> I thought lsa might be a good fit, but I am not sure.  I have limited
time, so I am hoping someone can point me in a direction of what I am looking
for.
>> 
>> I have been searching for "text classifiers", perhaps this
problem is referred to as something else.
>> 
> 
> This is outside my expertise, but if memory serves, you might benefit
> from googling the Levenshtein (spelling?) distance which allows this
> sort of fuzzy matching of strings.
> 
> MW

R help - Nov 2012 - library/function to compare two phrases?

[R] library/function to compare two phrases?

[R] library/function to compare two phrases?

[R] library/function to compare two phrases?