On Sat, May 12, 2007 at 09:46:54PM -0500, Yannick Warnier
wrote:> I was just having a quick look at Xapian's documentation again and
> wondering... Does Xapian offer some kind of thesaurus functionality?
No, there's no thesaurus feature at present.
> If not, would it be trivial to implement one considering the current
> API, or is that something that might take very long?
Implementing the code to handle a thesaurus probably isn't a major
project - it depends exactly what you're expecting it to do though.
For example, it should be hard to add an "and synonyms" query operator
so `~facts' might be roughly equivalent to `(facts OR information OR
data OR statistics)'.
The hard part of implementing a thesaurus is often generating the
thesaurus data and especially keeping it up to date. If your
application is something like a news website, the vocabulary changes on
an almost daily basis, but even in other fields it evolves over time.
> And this should be the topic of another e-mail, but did anybody discuss
> about implementing word stemming for East-Asian languages?
There's been some past discussion on this list and the snowball list,
at least for Japanese:
http://search.gmane.org/?query=japanese%20stemming
Cheers,
Olly