Emmanuel Engelhart
2009-Jul-17 17:30 UTC
[Xapian-discuss] Problem getting Xapian working with Burmese
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I use Xapian in my project with multiple latin languages and it works good. I have also tried with Parsi, and it looks to work too. But, with Burmese, this is a little bit different. What I do: mkdir html cd html wget -O doc.html http://my.wikipedia.org cd .. omindex --db=./xapdb ./html/ To make a simple search in the db I use the following Perl script (my code is in C++ and it does not work too): ==================================================================#!/usr/bin/perl use Search::Xapian; use utf8; my $db = Search::Xapian::Database->new( './xapdb' ); my $enq = $db->enquire( $ARGV[0] ); printf "Running query '%s'\n", $enq->get_query()->get_description(); my @matches = $enq->matches(0, 10); print scalar(@matches) . " results found\n"; foreach my $match ( @matches ) { my $doc = $match->get_document(); printf "ID %d %d%% [ %s ]\n", $match->get_docid(), $match->get_percent(), $doc->get_data(); } ================================================================== ./search.pl problems ... returns the document, because you have at the beginning of the page a sentence in English with this word inside. ./search.pl ???? ... return a result too. ./search ????????????????? ./search ????????????? ... do not work... in fact it does not work most of the time. I seems to work only with Burmese words wich are short and/or only with certain characters. Is that normal? Regards Emmanuel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpgtUAACgkQn3IpJRpNWtPRRgCfZukUGfG8Eliv6SKZDXoAWnlI SP8Animz/5IUtSl9Ba2oV8vJLkjdLcDX =QjZX -----END PGP SIGNATURE-----