Ivar Bratberg
2005-Jun-09 13:13 UTC
[Xapian-discuss] Query parser and stemming of norwegian letters
Hello, can I get an explanation of the following. Running the following code: .... pqp=new QueryParser(); Stem stem("norwegian"); cout << "DEBUG " << stem.stem_word(_sXapian)<< endl; pqp->set_stemmer(stem); pqp->set_database(*_pdatabase); pqp->set_default_op(Query::OP_AND); //Set the enquire Query p=pqp->parse_query(_sXapian); cout << " Query " << string(bufSL) << p.get_description() << endl; --- gives the follwing output DEBUG h?y Query norwegianXapian::Query((ha:(pos=1) AND y:(pos=2))) the ? is unicode c3b8 Why does the queryparser produce something different than a direct stemmer call ? Best regards, IB
Olly Betts
2005-Jun-09 16:55 UTC
[Xapian-discuss] Query parser and stemming of norwegian letters
On Thu, Jun 09, 2005 at 02:11:16PM +0200, Ivar Bratberg wrote:> Why does the queryparser produce something different than a direct > stemmer call ?Because QueryParser needs to tokenise before stemming. Currently it isn't unicode aware, so it treats the unicode character as a word break. Hopefully I'll be fixing this next week. Cheers, Olly