Ivar Bratberg
2005-Jun-09 13:13 UTC
[Xapian-discuss] Query parser and stemming of norwegian letters
Hello, can I get an explanation of the following.
Running the following code:
....
pqp=new QueryParser();
Stem stem("norwegian");
cout << "DEBUG " <<
stem.stem_word(_sXapian)<< endl;
pqp->set_stemmer(stem);
pqp->set_database(*_pdatabase);
pqp->set_default_op(Query::OP_AND);
//Set the enquire
Query p=pqp->parse_query(_sXapian);
cout << " Query " << string(bufSL) <<
p.get_description() << endl;
---
gives the follwing output
DEBUG h?y
Query norwegianXapian::Query((ha:(pos=1) AND y:(pos=2)))
the ? is unicode c3b8
Why does the queryparser produce something different than a direct
stemmer call ?
Best regards,
IB
Olly Betts
2005-Jun-09 16:55 UTC
[Xapian-discuss] Query parser and stemming of norwegian letters
On Thu, Jun 09, 2005 at 02:11:16PM +0200, Ivar Bratberg wrote:> Why does the queryparser produce something different than a direct > stemmer call ?Because QueryParser needs to tokenise before stemming. Currently it isn't unicode aware, so it treats the unicode character as a word break. Hopefully I'll be fixing this next week. Cheers, Olly