Jean-Francois Dockes
2006-Oct-23 08:45 UTC
[Xapian-discuss] Read accesses through WritableDatabase are slow
Except if I'm mistaken, read accesses (like postlist_begin(), get_document()) through a Xapian::WritableDatabase seem to trigger writes on the database files, which makes them slow (because of the fsync() calls). Actually, it is apparently enough for the Xapian::WritableDatabase to exist, for the write calls to be triggered even if actual accesses go through a Xapian::Database. A program that does a variable mix of read and write accesses has no good solution between using a WritableDatabase for all accesses (slow for read), or on-demand creation of the WritableDatabase (slow for write). The indexing program in Recoll uses the xapian index to store the modification time for each indexed file (on a later pass, only modified files are actually indexed). This means that on a normal run the program mostly does read queries (to check the mtimes), with occasional indexing for modified files. The many write-on-read operations on the index make this more slow and disk-intensive than would appear necessary. I could use an auxiliary database to store the update times. Would someone have a better idea of how to handle this issue ? Regards, J.F. Dockes
Olly Betts
2006-Oct-23 13:26 UTC
[Xapian-discuss] Read accesses through WritableDatabase are slow
On Mon, Oct 23, 2006 at 09:45:14AM +0200, Jean-Francois Dockes wrote:> Except if I'm mistaken, read accesses (like postlist_begin(), > get_document()) through a Xapian::WritableDatabase seem to trigger writes > on the database files, which makes them slow (because of the fsync() > calls).It depends on the backend, but in general this shouldn't be true for most methods of Database when called on a WritableDatabase. Flint should only force a flush if you call allterms_begin(). It could be handled, but it's not been implemented so far since it doesn't seem likely you'd call this method a lot during update. It would be nice to fix this though, as it would also remove the restriction that allterms_begin() can't be called during a transaction. Quartz flushes for allterms_begin() too, and also for postlist_begin(), but only if the term you're asking for the postlist for has had any changes to its postlist since the last flush (implicit or explicit). You shouldn't get a forced flush on get_document() that I can see.> Actually, it is apparently enough for the Xapian::WritableDatabase to > exist, for the write calls to be triggered even if actual accesses go > through a Xapian::Database.If the WritableDatabase object is assigned to a Database object this will be the case (because that's how OO works - it's still a WritableDatabase underneath). You could create a separate Database object for the same database, but you'd have to call reopen() frequently which would defeat the point rather.> Would someone have a better idea of how to handle this issue ?Use flint is you aren't already. Try to avoid calling allterms_begin() during update, at least until the forced flush is eliminated. If you're really seeing forced flushes on other methods, report this! If you can read C++, backends/flint/flint_database.cc is where the action happens (and in a similarly named file for Quartz). The relevant methods are those of FlintWritableDatabase, and the forced flush is done by a call to "do_flush_const". Cheers, Olly