Hi all, I've been toying with xapian (mostly using the Python bindings) and I think I've hit a bug in the TermIterator::skip_to() method (or maybe in QuartzAllTermsList::skip_to()). I've attached a c++ source file that demonstrates the issue. In short, if you have a WritableDatabase, ask for the all-terms TermIterator with db.allterms_begin(), and then skip_to() a word that is itself a term, the iterator sometimes stays at the beginning of the term list. In the attached file this is demonstrated with skip_to("amsterdam"), which results in the iterator remaining at the first term ("!!!"). If you skip_to("amsterda"), the iterator does move to "amsterdam". Any ideas on what is going on? Cheers Pascal -------------- next part -------------- A non-text attachment was scrubbed... Name: skipto_bug.cc Type: text/x-c Size: 1629 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20050225/02fd1690/attachment.bin>
Hi again, I'd forgotton to include the output of the demonstration program, and mailmain seems not to like .cc attachments, so here's the output and the code inlined: ---- output ---- First term after skipping to amsterdam: !!! First term after skipping to amsterda: amsterdam All terms: !!! aarvark amsterdam amsterdamse beast rotterdam ---- source ---- #include <vector> #include <string> #include <iostream> #include <iterator> #include <xapian.h> using namespace std; int main(int argc, char** argv) { string data = "aarvark beast rotterdam amsterdamse amsterdam !!!"; vector<string> words; int pos = 0; while ( pos < data.length() ) { int end = data.find( ' ', pos ); if ( end == string::npos ) end = data.length(); words.push_back( data.substr( pos, end-pos ) ); pos = end+1; } try { Xapian::WritableDatabase db = Xapian::WritableDatabase( "/tmp/xapian.db", Xapian::DB_CREATE_OR_OPEN ); Xapian::Document doc = Xapian::Document(); doc.set_data( data ); pos = 1; for ( vector<string>::iterator i = words.begin(); i !words.end(); i++ ) { doc.add_posting( *i, pos++ ); } db.add_document( doc ); Xapian::TermIterator terms = db.allterms_begin(); string skip = "amsterdam"; terms.skip_to( skip ); cout << "First term after skipping to " << skip << ": " << *terms << endl; terms = db.allterms_begin(); skip = "amsterda"; terms.skip_to( skip ); cout << "First term after skipping to " << skip << ": " << *terms << endl; cout << "All terms:" << endl; for( Xapian::TermIterator i = db.allterms_begin(); i !db.allterms_end(); i++ ) { cout << *i << endl; } } catch ( const Xapian::Error& error ) { cout << "Exception: " << error.get_msg() << endl; } } Cheers Pascal
On Fri, Feb 25, 2005 at 02:40:35PM +0100, Pascal Beis wrote:> In the attached file this is demonstrated with skip_to("amsterdam"), > which results in the iterator remaining at the first term ("!!!"). If > you skip_to("amsterda"), the iterator does move to "amsterdam". > > Any ideas on what is going on?I've found the problem. When iterating allterms, quartz mishandles the case when skip_to finds an exact match by not updating the value returned by the iterator. The position is changed, so in this case next will move you to the term after "amsterdam". The attached patch should fix your test program. I'll check it into CVS once I've written a regression test. Thanks for reporting this bug! Cheers, Olly -------------- next part -------------- Index: backends/quartz/quartz_alltermslist.cc ==================================================================RCS file: /usr/data/cvs/xapian/xapian-core/backends/quartz/quartz_alltermslist.cc,v retrieving revision 1.20 diff -p -u -r1.20 quartz_alltermslist.cc --- backends/quartz/quartz_alltermslist.cc 20 Aug 2004 16:38:55 -0000 1.20 +++ backends/quartz/quartz_alltermslist.cc 25 Feb 2005 16:28:40 -0000 @@ -130,8 +130,8 @@ QuartzAllTermsList::skip_to(const string next(); } } else { - DEBUGLINE(DB, "QuartzAllTermList[" << this << "]::skip_to(): key is " << - pl_cursor->current_key); + Assert(key == pl_cursor->current_key); + current_term = tname; } RETURN(NULL); }