Hello, I'm using the Python bindings to query a remote xapian-tcpsrv (xapian 1.1.0). Once I have the result set I can get the SearchResult objects from it. However if I wait a little bit and then try to access the result object I get the following error: DatabaseModifiedError: REMOTE:The revision being read has been discarded - you should call Xapian::Database::reopen() and retry the operation I guess this is due to a change in the database (there is another process adding documents). Is there a way to tell xapain to cache locally the SearchResult objects so I won't need to call db.reopen() all the time? Thanks, -- Miki Tebeka miki at fattoc.com
On Mon, May 18, 2009 at 07:16:36PM -0700, Miki Tebeka wrote:> I'm using the Python bindings to query a remote xapian-tcpsrv (xapian > 1.1.0). > > Once I have the result set I can get the SearchResult objects from it.Um, there isn't a class called "SearchResult" in Xapian.> However if I wait a little bit and then try to access the result object I > get the following error: > DatabaseModifiedError: REMOTE:The revision being read has been discarded - > you should call Xapian::Database::reopen() and retry the operation > > I guess this is due to a change in the database (there is another process > adding documents).Yes.> Is there a way to tell xapain to cache locally the SearchResult objects so > I won't need to call db.reopen() all the time?Readers don't currently lock the revision they are using, so you have to reopen() to move them on to a revision which actually still exists. The main reason readers don't lock the revision is that nobody has implemented it. There's also the issue that it makes it very easy to inadvertently bloat the database with lots of old revisions. I'm not sure there's a way to avoid that. For now, I'd suggest just reading the results you want right away rather than waiting a bit. It also helps to batch up writes more (which also improves the efficiency of applying them) - see XAPIAN_FLUSH_THRESHOLD for how to adjust the default flushing. Cheers, Olly
On Wed, May 20, 2009 at 04:09:23PM -0700, Miki Tebeka wrote:> Hello James,Please keep replies on-list so everyone can help and learn from each other.> > If you read all 10 (say) items in the MSet as soon as you grab it, you > > should only rarely need to reopen the db in the process. > > My system gets about 100000 new documents a day, so this is happening all > the time.That's only 1-2 per second, and if you batch things up you should be able to get through a search between batches almost all the time.> > Or (better, and neater) stick the entire thing behind some sort of facade > > interface that does the reopening for you. > > Currently I have something in the lines of: > def get_data(doc, db): > for i in range(10): > try: > return doc.data.copy() > except xapian.DatabaseModifiedError, e: > db.reopen()You probably want to add: raise e to the end (set e to a 'help unexplained!' error at the start of the function). You may already have something equivalent, but thought it worth pointing out.> I'm still getting these errors when the load is heavy.Definitely look at batching your writes, then. (I assume the error is always from there and not elsewhere.) Note that by default the library will batch writes anyway, so I'm guessing you're creating a new process (or at least a new WriteableDatabase instance) for each new document? If you can't batch the writes cleanly, you could consider writing in updates to a copy of the database, duping the entire thing periodically (zero-cost snapshots in your file system would help, else you want to exit the write process during the copy), and using a stub database file to switch the readers between the two databases -- one is being read from only, one written to only. (I could have sworn we had a document on the latter strategy, but...I can't find it, so I'm probably wrong.) J -- James Aylett talktorex.co.uk - xapian.org - uncertaintydivision.org