Lucas Carlson
2005-Apr-25 17:00 UTC
[ANN] Classifier 1.2 with Bayesian and NEW LSI classification
You may remember that I announced the Bayesian classifier a couple of weeks ago. With the help of David Fayram, we added LSI classification so that you can now do both: b = Classisifer::Bayes.new lsi = Classifier::LSI.new LSI is Latent Semantic Indexer, which can search, classify and cluster data based on underlying semantic relations. It uses more resources than the Bayesian classifier and even requires an external library, but can still be Marshalled for Madeline or DRB''s sake. For more information on the algorithms used, please consult http://en.wikipedia.org/wiki/Latent_Semantic_Indexing One really cool part about LSI is that it can give you automatic summarization features (see the subversion trunk for previews of this). This could be a big deal for anybody who wants to do RSS with Rails but is worried about bandwidth consumption because it is a programatic way to automatically summarize the most important parts of your articles. I also added an #untrain method to reverse the effects of training the Bayesian classifier. LSI can also untrain itself. To upgrade, try: gem update classifier Or see this site: http://rubyforge.org/projects/classifier/ Again, all feedback is appreciated. -Lucas Carlson http://tech.rufy.com/