Hi, Just a minute ago I've installed Flax and I'm very surprised. Eventhough it's very far from perfect, it's very usable with my language, Thai, out-of-the-box. And I'm wondering how does Xapian handle Thai ? Tokenize or N-Gram ? I've seen the list of language Xapian support (in which Thai is not included) But how does it handle other languages ? I'm looking for way to perform full-text search for my language. Thanks
Out of curiousity, what if anything are you using for segmentation? Are you doing character based indexing? I understood that Thai has no standard for word segmentation. 2008/6/10 Sakesun Roykiattisak <sakesun at boonthavorn.com>:> > Hi, > > Just a minute ago I've installed Flax and I'm very surprised. Eventhough > it's > very far from perfect, it's very usable with my language, Thai, > out-of-the-box. > And I'm wondering how does Xapian handle Thai ? Tokenize or N-Gram ? > I've seen the list of language Xapian support (in which Thai is not > included) > But how does it handle other languages ? > > I'm looking for way to perform full-text search for my language. > > Thanks > > > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >
> Just to avoid confusion, I think the original poster installed the > Windows build of Flax Basic, which is based on Xapian. We don't do any > clever extra things with tokenisation in Flax so far, so we're pleased > it works!Exactly. Sorry for incomplete information.> > These threads might be useful: > http://thread.gmane.org/gmane.comp.search.xapian.general/4574/focus=4779 > http://thread.gmane.org/gmane.comp.search.xapian.general/3178/focus=3182Thanks for the pointer.
On Tue, Jun 10, 2008 at 08:45:54PM +0700, Sakesun Roykiattisak wrote:> Just a minute ago I've installed Flax and I'm very surprised. > Eventhough it's very far from perfect, it's very usable with my > language, Thai, out-of-the-box.This probably isn't the best place to ask about flax. It's not part of Xapian, rather a framework which uses Xapian.> And I'm wondering how does Xapian handle Thai ? Tokenize or N-Gram ? > I've seen the list of language Xapian support (in which Thai is not > included) > But how does it handle other languages ?That "list of languages" is just the ones that we provide stemming algorithms for. You can index other languages, though how you do so is up to you. There's not currently any built-in support.> I'm looking for way to perform full-text search for my language.I'm afraid I don't know much about Thai. I'm hoping we can add support for indexing and searching using n-grams for CJKV in 1.1 - from what you say above, it sounds like that should help for Thai too. Cheers, Olly