Dear Olly, I have encountered an unexpected thing. Please look at the following commands and results for test! $ python search.py -v C++ Performing query 'Xapian::Query(c++:(pos=1))' 0 results found $ python search.py -v c++ Performing query 'Xapian::Query(c:(pos=1))' 10 results found I have known QueryParser looks up term list in the database from the following code. // If the suffixed term doesn't exist, check that the // non-suffixed term does. This also takes care of // the case when QueryParser::set_database() hasn't // been called. if (db.term_exists(suff_term) || !db.term_exists(term)) { term = suff_term; it = p; } In my database the term 'c' exists, but 'C' doesn't exist. All the terms are indexed in lowercase in my database, because I knew QueryParser always changes terms to lowercase. Why QP does not convert the term to lowercase before it calls db.term_exists() to look up term list? Is there any reason I am not aware of? For better Xapian, Sungsoo Kim
On Tue, Mar 07, 2006 at 02:58:47AM +0900, Sungsoo Kim wrote:> Dear Olly, > > I have encountered an unexpected thing. > Please look at the following commands and results for test! > > $ python search.py -v C++ > Performing query 'Xapian::Query(c++:(pos=1))' > 0 results found > > $ python search.py -v c++ > Performing query 'Xapian::Query(c:(pos=1))' > 10 results found > > I have known QueryParser looks up term list in the database > from the following code. > > // If the suffixed term doesn't exist, check that the > // non-suffixed term does. This also takes care of > // the case when QueryParser::set_database() hasn't > // been called. > if (db.term_exists(suff_term) || !db.term_exists(term)) { > term = suff_term; > it = p; > } > > In my database the term 'c' exists, but 'C' doesn't exist. > All the terms are indexed in lowercase in my database, > because I knew QueryParser always changes terms to > lowercase. > > Why QP does not convert the term to lowercase before it > calls db.term_exists() to look up term list? > > Is there any reason I am not aware of? >QP treats capitalized terms as "raw" terms, i.e. terms that should not be stemmed. "Test" will be parsed to "Rtest", "C" will be parsed to "Rc". HTH Ralf Mattes> For better Xapian, > > > Sungsoo Kim> _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss
On Tue, Mar 07, 2006 at 02:58:47AM +0900, Sungsoo Kim wrote:> Why QP does not convert the term to lowercase before it > calls db.term_exists() to look up term list?That's a bug, thanks for noticing. The attached patch should fix it. Cheers, Olly -------------- next part -------------- Index: queryparser/queryparser.lemony ==================================================================--- queryparser/queryparser.lemony (revision 6534) +++ queryparser/queryparser.lemony (working copy) @@ -376,7 +376,15 @@ // non-suffixed term does. This also takes care of // the case when QueryParser::set_database() hasn't // been called. - if (db.term_exists(suff_term) || !db.term_exists(term)) { + bool use_suff_term = false; + string lc = downcase_term(suff_term); + if (db.term_exists(lc)) { + use_suff_term = true; + } else { + lc = downcase_term(term); + if (!db.term_exists(lc)) use_suff_term = true; + } + if (use_suff_term) { term = suff_term; it = p; }