Hello, new to the list, I am interested in Xapian. While reading the site, I found that http://xapian.org/docs/stemming.html states: "A stemming algorithm is a process of linguistic normalisation, in which the variant forms of a word are reduced to a common form... For many of the world's languages, Chinese and Japanese for example, this concept is irrelevant," Which I found very strange. Of course, stemming is very valuable in Japanese language. I think it is even better example than English example of connection/connective/connected/connecting. For example: éɪë odoru dance éɪéªÊª¤ odoranai doesn't dance éɪê¿ odotta danced éɪéªÊª«ªÃª¿ odoranakatta didn't dance éɪìªë odoreru can dance éɪìªÊª¤ odorenai can't dance éɪ쪿 odoreta could dance éɪìªÊª«ªÃª¿ odorenakatta couldn't dance éɪêƪ¤ªë odotteiru is dancing éɪêƪ¤ªÊª¤ odotteinai isn't dancing And so on. (Okay, this is rather obvious because only stem is written in Kanji. You can replace éÉ with ªªªÉ then.) Yes, as you can see, I started to learn Japanese recently. :-) I am not sure I may try to write Japanese stemmer myself... Can anyone help? I visited the Snowball site and read the manual there. It was an interesting read. Seo Sanghyeon
On Sun, Apr 17, 2005 at 10:07:41PM +0900, Seo Sanghyeon wrote:> "A stemming algorithm is a process of linguistic normalisation, in which > the variant forms of a word are reduced to a common form... For many of > the world's languages, Chinese and Japanese for example, this concept is > irrelevant," > > Which I found very strange. Of course, stemming is very valuable in > Japanese language.Thanks for pointing this out. I've removed Japanese as an example here.> Yes, as you can see, I started to learn Japanese recently. :-) I am > not sure I may try to write Japanese stemmer myself... Can anyone > help? > > I visited the Snowball site and read the manual there. It was an > interesting read.If you're interested in collaborating on a Japanese stemmer, I'd suggest asking on the snowball mailing list. Cheers, Olly