Marco Tabini
2005-Jun-24 22:28 UTC
[Xapian-discuss] Iterating through all the documents of a db
Hi All-- Is there a way to iterate through all the documents in a database? I *can* just get the last doc id and work my way through sequentially, trapping any errors which indicate that a specific document doesn't exist... But that seems like such an inefficient way of doing things :) I need this to do some maintenance on the documents. Thanks, Marco
Olly Betts
2005-Jun-24 23:28 UTC
[Xapian-discuss] Iterating through all the documents of a db
On Fri, Jun 24, 2005 at 05:27:33PM -0400, Marco Tabini wrote:> Is there a way to iterate through all the documents in a database? I *can* > just get the last doc id and work my way through sequentially, trapping any > errors which indicate that a specific document doesn't exist... But that > seems like such an inefficient way of doing things :)That's the way to do it at present. See: http://svn.xapian.org/trunk/xapian-core/examples/copydatabase.cc?rev=6097&view=markup Should be ok efficiency-wise. There's a plan to add a way to iterate directly over all documents: http://xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=47 The idea is that you'll just be able to iterate over the postlist for an empty termname, and that'll actually iterator over all documents in the database. It'll be as if there's a magic term "" which indexes all documents. Currently postlist_begin("") throws an exception. I got as far as a prototype patch, but there was some obstacle which made it easier to leave until something else got done. Sadly I can't remember what that was now! When I next have a spare moment, I'll try applying the patch and see. I'll attach the (non-usable) patch to the bugzilla entry to make sure it doesn't get lost. Cheers, Olly