emmanuel at engelhart.org
2009-Aug-21 12:44 UTC
[Xapian-discuss] Problem getting Xapian working with Burmese
Hi I want to update my request. Is my question bad formulated? too trivial? ... or maybe pretty complicated/unclear? In fact I'm not a Xapian nor a search engine expert, so I have no Idea where I have to start my investigation. Without having the answer to my question, maybe someone can give me Idea how to better understand the issue? Regards Emmanuel Le ven 17/07/09 19:30, "Emmanuel Engelhart" emmanuel at engelhart.org a ?crit:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > I use Xapian in my project with multiple latin languages and it works > good. I have also tried with Parsi, and it looks to work too. > > But, with Burmese, this is a little bit different. What I do: > > mkdir html > cd html > wget -O doc.html http://my.wikipedia.orgcd .. > omindex --db=./xapdb ./html/ > > To make a simple search in the db I use the following Perl script (my > code is in C++ and it does not work too): > > ==================================================================> #!/usr/bin/perl > > use Search::Xapian; > use utf8; > > my $db = Search::Xapian::Database->new( './xapdb' ); > my $enq = $db->enquire( $ARGV[0] ); > > printf "Running query '%s'\n", > $enq->get_query()->get_description(); > my @matches = $enq->matches(0, 10); > > print scalar(@matches) . " results found\n"; > > foreach my $match ( @matches ) { > my $doc = $match->get_document(); > printf "ID %d %d%% [ %s ]\n", $match->get_docid(), > $match->get_percent(), $doc->get_data(); > } > ==================================================================> > ./search.pl problems > > ... returns the document, because you have at the beginning of the page > a sentence in English with this word inside. > > ./search.pl ???? > > ... return a result too. > > ./search ????????????????? > ./search ????????????? > > ... do not work... in fact it does not work most of the time. I seems > towork only with Burmese words wich are short and/or only with certain > characters. > > Is that normal? > > Regards > Emmanuel > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > iEYEARECAAYFAkpgtUAACgkQn3IpJRpNWtPRRgCfZukUGfG8Eliv6SKZDXoAWnlI > SP8Animz/5IUtSl9Ba2oV8vJLkjdLcDX > =QjZX > -----END PGP SIGNATURE----- > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.orghttp://lists.xapian.org/mailman/listinfo/xapian-discuss > >
Olly Betts
2009-Aug-30 05:58 UTC
[Xapian-discuss] Problem getting Xapian working with Burmese
On Fri, Aug 21, 2009 at 02:44:44PM +0200, emmanuel at engelhart.org wrote:> I want to update my request. > Is my question bad formulated? too trivial? ... or maybe pretty > complicated/unclear?I think nobody answered as it was hard to follow your example because the Burmese characters seem to have been mangled (at least the message I received wasn't valid utf-8). But looking at the code, I see an issue:> my $db = Search::Xapian::Database->new( './xapdb' ); > my $enq = $db->enquire( $ARGV[0] );What this does is to create an Enquire object and set Query($ARGV[0]) as the query. That works OK if $ARGV[0] is a single word which gets indexed as a single term, but you really want to parse the query string to get a Query object: my $db = Search::Xapian::Database->new( './xapdb' ); my $queryparser = Search::Xapian::QueryParser->new(); my $query = $queryparser->parse_query( $ARGV[0] ); my $enq = $db->enquire( $query ); I'd guess that is probably your problem, but I can't tell for sure as I can't test your examples... For further information on debugging this sort of problem, see: http://trac.xapian.org/wiki/FAQ/NoMatches Cheers, Olly