On Mon, 2005-08-29 at 11:04 +0200, Marcus Ramberg wrote:> hey. I'm having some problems with the Xapian QueryParser using the
> perl bindings. It turns all scandinavian characters into the english
> alphabet. See the following example:
>
> $qp->set_stemmer($stemmer);
> print $qp->parse_query('b?lle')."\n";
> print $stemmer->stem_word('b?lle')."\n";
>
> Returns
>
> marcus@ds1:~/src/Horus-Indexer$ ./stemtest
> Xapian::Query(bolle:(pos=1))
> b?lle
>
> So, I'm pretty sure it's not the stemmer. Any other ideas?
Lost's of :-)
Yes, the queryparser itself modifies characters. The code that does this
is in 'xapian/xapian-core/queryparser/accentnormalisingitor.h'. IMHO
this is a rather "murky" and anglocentric part of the Xapian codebase.
Frankly, i just removed the offending parts of the code - but a cleaner
solution would be preferable. My current approach would be to make
the static tables in 'xapian/xapian-core/queryparser/symboltab.h'
configurable by language (sigh, not enough time right now).
HTH Ralf Mattes
> Marcus
>
> Ps. ( for your info, b?lle eq bully, and bolle eq 'bowl' )
> Pps. I've implemented the set_parser function in QueryParser. It
> should work, and I get the same results with set_stemming_options. :)
>
> Marcus
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss@lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss