Dear Olly,
I have encountered an unexpected thing.
Please look at the following commands and results for test!
$ python search.py -v C++
Performing query 'Xapian::Query(c++:(pos=1))'
0 results found
$ python search.py -v c++
Performing query 'Xapian::Query(c:(pos=1))'
10 results found
I have known QueryParser looks up term list in the database
from the following code.
// If the suffixed term doesn't exist, check that the
// non-suffixed term does. This also takes care of
// the case when QueryParser::set_database() hasn't
// been called.
if (db.term_exists(suff_term) || !db.term_exists(term)) {
term = suff_term;
it = p;
}
In my database the term 'c' exists, but 'C' doesn't exist.
All the terms are indexed in lowercase in my database,
because I knew QueryParser always changes terms to
lowercase.
Why QP does not convert the term to lowercase before it
calls db.term_exists() to look up term list?
Is there any reason I am not aware of?
For better Xapian,
Sungsoo Kim
On Tue, Mar 07, 2006 at 02:58:47AM +0900, Sungsoo Kim wrote:> Dear Olly, > > I have encountered an unexpected thing. > Please look at the following commands and results for test! > > $ python search.py -v C++ > Performing query 'Xapian::Query(c++:(pos=1))' > 0 results found > > $ python search.py -v c++ > Performing query 'Xapian::Query(c:(pos=1))' > 10 results found > > I have known QueryParser looks up term list in the database > from the following code. > > // If the suffixed term doesn't exist, check that the > // non-suffixed term does. This also takes care of > // the case when QueryParser::set_database() hasn't > // been called. > if (db.term_exists(suff_term) || !db.term_exists(term)) { > term = suff_term; > it = p; > } > > In my database the term 'c' exists, but 'C' doesn't exist. > All the terms are indexed in lowercase in my database, > because I knew QueryParser always changes terms to > lowercase. > > Why QP does not convert the term to lowercase before it > calls db.term_exists() to look up term list? > > Is there any reason I am not aware of? >QP treats capitalized terms as "raw" terms, i.e. terms that should not be stemmed. "Test" will be parsed to "Rtest", "C" will be parsed to "Rc". HTH Ralf Mattes> For better Xapian, > > > Sungsoo Kim> _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss
On Tue, Mar 07, 2006 at 02:58:47AM +0900, Sungsoo Kim wrote:> Why QP does not convert the term to lowercase before it > calls db.term_exists() to look up term list?That's a bug, thanks for noticing. The attached patch should fix it. Cheers, Olly -------------- next part -------------- Index: queryparser/queryparser.lemony ==================================================================--- queryparser/queryparser.lemony (revision 6534) +++ queryparser/queryparser.lemony (working copy) @@ -376,7 +376,15 @@ // non-suffixed term does. This also takes care of // the case when QueryParser::set_database() hasn't // been called. - if (db.term_exists(suff_term) || !db.term_exists(term)) { + bool use_suff_term = false; + string lc = downcase_term(suff_term); + if (db.term_exists(lc)) { + use_suff_term = true; + } else { + lc = downcase_term(term); + if (!db.term_exists(lc)) use_suff_term = true; + } + if (use_suff_term) { term = suff_term; it = p; }