Hello All, I want to use xapian to index chinese html pages. I found the cjk-tokenizer lib in the maillist http://lists.tartarus.org/pipermail/xapian-discuss/2007-June/003921.html However, I do not know how to add this lib to the xapian project. Is there any example or steps? Thank you! Li Yong
On Wed, Apr 08, 2009 at 05:08:31PM +0800, Li Yong wrote:> I want to use xapian to index chinese html pages. > > I found the cjk-tokenizer lib in the maillist > http://lists.tartarus.org/pipermail/xapian-discuss/2007-June/003921.html > > However, I do not know how to add this lib to the xapian project.That's just a link to the one in Lucene. This one might be more useful: http://thread.gmane.org/gmane.comp.search.xapian.general/4574/focus=4762> Is there any example or steps?I've not tried to use it myself. The longer term plan is to include this or something similar in Xapian itself, but nobody is currently working on it as far as I know. For now, I think you'd have to just ignore Xapian::TermGenerator and Xapian::QueryParser and add the bigram terms with add_posting() when indexing and combine them into queries with OP_AND. Cheers, Olly
On Wed, Apr 8, 2009 17:08:31 +0800, Li Yong <sdliyong at gmail.com> wrote:> I want to use xapian to index chinese html pages. > > I found the cjk-tokenizer lib in the maillist > http://lists.tartarus.org/pipermail/xapian-discuss/2007-June/003921.html > > However, I do not know how to add this lib to the xapian project. > > Is there any example or steps? >Pinot uses a slightly modified version of Yung-Chung Lin's cjk-tokenizer that can be found at http://svn.berlios.de/wsvn/dijon/trunk/cjkv/CJKVTokenizer.cc For an example, see the XapianIndex and TokensIndexer classes at http://svn.berlios.de/wsvn/pinot/trunk/IndexSearch/Xapian/XapianIndex.cpp I hope this helps. Fabrice
> > Yes - change the code which currently uses Xapian::TermGenerator (for > indexing), and the code which currently uses Xapian::QueryParser (for > searching). > > Cheers, > Olly >Hello Olly, Thank you for your mail. I will perform some tests and if there is any question, I will send mail again! Li Yong
LiYong, Here is an example of Xapian search engine searching on Chinese, Japanese or Korean text. You can search let say for "hello" = "??" in Chinese and here are all the site crawled that contains hello=?? http://pacificair.com/search?q=?? Cheers, Kevin Duraj On Wed, Apr 8, 2009 at 5:41 PM, LiYong <sdliyong at gmail.com> wrote:>> >> Yes - change the code which currently uses Xapian::TermGenerator (for >> indexing), and the code which currently uses Xapian::QueryParser (for >> searching). >> >> Cheers, >> Olly >> > > Hello Olly, > > Thank you for your mail. > > I will perform some tests and if there is any question, I will send mail > again! > > Li Yong > > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >
Hello Kevin, Thank you for your information. Can you share more information about this search engine? Li Yong 2009/4/14 Kevin Duraj <kevin.softdev at gmail.com>:> LiYong, > > Here is an example of Xapian search engine searching on Chinese, > Japanese or Korean text. > You can search let say for "hello" = "??" in Chinese and here are all > the site crawled ?that > contains ?hello=?? > > http://pacificair.com/search?q=?? > > Cheers, > ? Kevin Duraj > > > > On Wed, Apr 8, 2009 at 5:41 PM, LiYong <sdliyong at gmail.com> wrote: >>> >>> Yes - change the code which currently uses Xapian::TermGenerator (for >>> indexing), and the code which currently uses Xapian::QueryParser (for >>> searching). >>> >>> Cheers, >>> ? ?Olly >>> >> >> Hello Olly, >> >> Thank you for your mail. >> >> I will perform some tests and if there is any question, I will send mail >> again! >> >> Li Yong >> >> >> >> _______________________________________________ >> Xapian-discuss mailing list >> Xapian-discuss at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-discuss >> >