Matthias Zeichmann
2005-Dec-29 12:50 UTC
[Xapian-discuss] stemming problems with perl interface
hi list, i am having trouble getting german stemming to work correctly; at least it appears like the stemmers of Search::Xapian::QueryParser and Search::Xapian::Stem yield different results for german. example code: ---------->8--------------------------- #!/usr/bin/perl use strict; use warnings; use Search::Xapian qw(:standard); my $db = Search::Xapian::Database->new('test'); my $qp = new Search::Xapian::QueryParser( $db ); $qp->set_stemming_options("german",1); my $srch = 't?ren'; # iso-8859-1 my $q = $qp->parse_query($srch); my $stem = Search::Xapian::Stem->new('german'); warn "VERSION:". $Search::Xapian::VERSION; warn "DESC:". $q->get_description; warn "STEM:". $stem->stem_word($srch); ---------->8--------------------------- gives this output: ---------->8--------------------------- VERSION:0.9.2.1 at search line 15. DESC:Xapian::Query(tuer:(pos=1)) at search line 16. STEM:tur at search line 17. ---------->8--------------------------- with english stemmer i get: ---------->8--------------------------- VERSION:0.9.2.1 at search line 15. DESC:Xapian::Query(tueren:(pos=1)) at search line 16. STEM:t?ren at search line 17. ---------->8--------------------------- thanks for consideration cheers matt
On Thu, Dec 29, 2005 at 12:38:17PM +0000, Matthias Zeichmann wrote:> i am having trouble getting german stemming to work correctly; at least it > appears like the stemmers of Search::Xapian::QueryParser and > Search::Xapian::Stem yield different results for german.Xapian::QueryParser currently normalises accents, so the u-umlaut is normalised to "ue". As you've noticed, this is a bit unexpected - where such normalisation is the appropriate thing to do, it should really be done in the stemmer itself. I'm currently tying up loose ends for 0.9.3, then my plan is to address this along with merging the utf-8 patches and the latest snowball stemmers in a new major release. Here's some previous discussion: http://thread.gmane.org/gmane.comp.search.xapian.general/1815 Cheers, Olly