thr3ads.net - Xapian discuss - [Xapian-discuss] Avoiding "stale" Search Results [May 2009]

If this information is useful, please help other people find it:
Share via:

Miki Tebeka

2009-May-19 02:16 UTC

[Xapian-discuss] Avoiding "stale" Search Results

Hello,

I'm using the Python bindings to query a remote xapian-tcpsrv (xapian
1.1.0).

Once I have the result set I can get the SearchResult objects from it.
However if I wait a little bit and then try to access the result object I
get the following error:
    DatabaseModifiedError: REMOTE:The revision being read has been discarded -
    you should call Xapian::Database::reopen() and retry the operation

I guess this is due to a change in the database (there is another process
adding documents).

Is there a way to tell xapain to cache locally the SearchResult objects so
I won't need to call db.reopen() all the time?

Thanks,
--
Miki Tebeka
miki at fattoc.com

Olly Betts

2009-May-20 03:00 UTC

head link

[Xapian-discuss] Avoiding "stale" Search Results

On Mon, May 18, 2009 at 07:16:36PM -0700, Miki Tebeka
wrote:> I'm using the Python bindings to query a remote xapian-tcpsrv (xapian
> 1.1.0).
> 
> Once I have the result set I can get the SearchResult objects from it.
Um, there isn't a class called "SearchResult" in Xapian.
> However if I wait a little bit and then try to access the result object I
> get the following error:
>     DatabaseModifiedError: REMOTE:The revision being read has been
discarded -
>     you should call Xapian::Database::reopen() and retry the operation
> 
> I guess this is due to a change in the database (there is another process
> adding documents).
Yes.
> Is there a way to tell xapain to cache locally the SearchResult objects so
> I won't need to call db.reopen() all the time?
Readers don't currently lock the revision they are using, so you have to
reopen() to move them on to a revision which actually still exists.

The main reason readers don't lock the revision is that nobody has
implemented it.  There's also the issue that it makes it very easy to
inadvertently bloat the database with lots of old revisions.  I'm not
sure there's a way to avoid that.

For now, I'd suggest just reading the results you want right away rather
than waiting a bit.  It also helps to batch up writes more (which also
improves the efficiency of applying them) - see XAPIAN_FLUSH_THRESHOLD 
for how to adjust the default flushing.

Cheers,
    Olly

James Aylett

2009-May-20 23:31 UTC

head link

[Xapian-discuss] Avoiding "stale" Search Results

On Wed, May 20, 2009 at 04:09:23PM -0700, Miki Tebeka wrote:
> Hello James,
Please keep replies on-list so everyone can help and learn from each
other.
> > If you read all 10 (say) items in the MSet as soon as you grab it, you
> > should only rarely need to reopen the db in the process. 
>
> My system gets about 100000 new documents a day, so this is happening all
> the time.
That's only 1-2 per second, and if you batch things up you should be
able to get through a search between batches almost all the time.
> > Or (better, and neater) stick the entire thing behind some sort of
facade
> > interface that does the reopening for you.
>
> Currently I have something in the lines of:
> def get_data(doc, db):
>     for i in range(10):
>         try:
>             return doc.data.copy()
>         except xapian.DatabaseModifiedError, e:
>             db.reopen()
You probably want to add:

      raise e

to the end (set e to a 'help unexplained!' error at the start of the
function). You may already have something equivalent, but thought it
worth pointing out.
> I'm still getting these errors when the load is heavy.
Definitely look at batching your writes, then. (I assume the error is
always from there and not elsewhere.) Note that by default the library
will batch writes anyway, so I'm guessing you're creating a new
process (or at least a new WriteableDatabase instance) for each new
document?

If you can't batch the writes cleanly, you could consider writing in
updates to a copy of the database, duping the entire thing
periodically (zero-cost snapshots in your file system would help, else
you want to exit the write process during the copy), and using a stub
database file to switch the readers between the two databases -- one
is being read from only, one written to only.

(I could have sworn we had a document on the latter strategy, but...I
can't find it, so I'm probably wrong.)

J

-- 
  James Aylett

  talktorex.co.uk - xapian.org - uncertaintydivision.org

Xapian discuss - May 2009 - Avoiding "stale" Search Results

[Xapian-discuss] Avoiding "stale" Search Results

[Xapian-discuss] Avoiding "stale" Search Results

[Xapian-discuss] Avoiding "stale" Search Results