Hallo to everybody, I would like to perform an analysis but I don't know how to proceed and whether R packages are available for my purpose or not. Therefore I'm here to request your support. *The idea is the following:* I noticed that the names of the towns and villages in northern Italy most of the time sound differently from names of cities based on southern Italy. Just to give you an idea "Caronno Pertusella" is a northern Italy village while Frascati is a center Italy town. Most of the time I am able to recognize where the town is located just hearing the name but I cannot say why, that is to say that I didn't find a "rule". What I would like to do is to find a classification rule/engine that is able to "locate" the city starting from its name. *I think the classification method should be based on the sequence of letters belonging to the town's name*. But this is just an intuition not yet formalized! I know that mine is a strange request and idea, anyway advices are very appreciated and welcome! Many thanks in advance to all. Steve -- View this message in context: http://r.789695.n4.nabble.com/Text-mining-tp4656732.html Sent from the R help mailing list archive at Nabble.com.
Hi Steve, IMO this problem does not need a classifier but rather a database and a simple query. I would just build a database with all city names including the geo information, and then say whether it is north or south exactly. If there was such a "rule" (which I doubt) I would expect it to have many exceptions and therefore a bunch of false-positives on both sides. Why overcomplicate a simple problem? HTH, Ciao, Giovanni -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Steve Stephenson Sent: Saturday, January 26, 2013 10:08 PM To: r-help at r-project.org Subject: [R] Text mining Hallo to everybody, I would like to perform an analysis but I don't know how to proceed and whether R packages are available for my purpose or not. Therefore I'm here to request your support. *The idea is the following:* I noticed that the names of the towns and villages in northern Italy most of the time sound differently from names of cities based on southern Italy. Just to give you an idea "Caronno Pertusella" is a northern Italy village while Frascati is a center Italy town. Most of the time I am able to recognize where the town is located just hearing the name but I cannot say why, that is to say that I didn't find a "rule". What I would like to do is to find a classification rule/engine that is able to "locate" the city starting from its name. *I think the classification method should be based on the sequence of letters belonging to the town's name*. But this is just an intuition not yet formalized! I know that mine is a strange request and idea, anyway advices are very appreciated and welcome! Many thanks in advance to all. Steve -- View this message in context: http://r.789695.n4.nabble.com/Text-mining-tp4656732.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Giovanni, thanks a lot for your quick reply!!! I try to answer you in a few points: 1 - A Data Base containing all the towns and the Region they belong to (North, Sud...) is already available on the ISTAT site (www.ISTAT.it); 2- My goal was just to find a "method" supporting my idea, that is to say that northern towns names "sound" different from "southern" names; 3- To build this method I should use the ISTAT DB, partially as training set and partially as validation set; 4- The idea was born just for fun since I find very interesting and also challenging the data mining; 5- I absolutely agree with you: I will find a lot of exception and therefore ; if the exceptions are greater than the rule (this could happen) this would imply that my initial idea is wrong. In any case I would be satisfied because this would mean that I have been able to prove if an in intuition is right or wrong. I hope this can clarify my previous post. Many thanks and *sorry for the lack of clarity*. Steve -- View this message in context: http://r.789695.n4.nabble.com/Text-mining-tp4656732p4656738.html Sent from the R help mailing list archive at Nabble.com.