Hello all, I want to work out a solution to counting bigrams and creating a co-occurrence matix with Xapian Perl modules. By check archived emails, there are some discussions about CJK tokens. I am just working on English documents. My immediate goals are how Xapian do bigrams and how can it do that with windowing, like NSP does with the -- window option. Did anyone work on this before? Do you have some suggestions? Thank you, Ying
☼ 林永忠 ☼ (Yung-chung Lin)
2009-Oct-27 03:27 UTC
[Xapian-discuss] bigrams and co-occurrence matrix
Hi Ying, You may check this http://code.google.com/p/cjk-tokenizer/ A perl binding is also included. Best, Yung-chung Lin 2009/10/26 Ying Liu <liux0395 at umn.edu>> Hello all, > > I want to work out a solution to counting bigrams and creating a > co-occurrence matix with Xapian Perl modules. By check archived emails, > there are some discussions about CJK tokens. I am just working on English > documents. My immediate goals are how Xapian do bigrams and how can it do > that with windowing, like NSP does with the -- window option. Did anyone > work on this before? Do you have some suggestions? > > Thank you, > Ying > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >