thr3ads.net - R help - [R] text mining - text comparing [May 2011]

If this information is useful, please help other people find it:
Share via:

Matevž Pavlič

2011-May-25 20:49 UTC

[R] text mining - text comparing

Hi all, 

 

I'll try to explain what i would like to achieve. 

I have two problmes that i would need help on if someone has a clue. 

 

 

1.)    I have a TXT file containing two fields : USCS and Description.

 

For each field of USCS I have a field Descrition that contained a lot of words
that describe that particular USCS type. What i would like to do is tomine the
text  using tm package in order to find which words in Description filed are the
most frequent for each USCS field.

 

Now i don't think i will have problems with that part, but the problem is
importing the data.  The thing is that there is areound 300 different USCS -
Descritption  combinations which is of course to much to sort out by hand. I
would have to create a Corpus of around 300 texts which I could later anylize. 
Here is where i get stuck. I can not find a way to import the data in a Corpus
so that i would have a text named after USCS value and containing strings
(words) of Desription field.

 

Attached (temp.txt) is a small dataset.

 

2.)    Second thing is about comparing text. I have some problems with typos in
a text, so what i would like is to find a words that are similar (but spelled
incorrectly). Similar that when typing in google engine, you get prposed words.
Has anyone had any experiance in that?

 

I hope i explaine ok, otherwise i'll try again, 

 

Tnx, m

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20110525/46b0af08/attachment.txt>

Maybe Matching Threads

Search for more apparently analagous threads

R help - May 2011 - text mining - text comparing

[R] text mining - text comparing

Maybe Matching Threads

Wisdom of the Ancients