Attached is a patch which turns off the B-tree versioning mechanism
during update. You'll lose atomic updates, so if the indexer or machine
crashes, you've probably got a useless database. But it reduces the
number of blocks written which can speed up updating quite a bit if
it is I/O bound - it's good for reindexing from scratch, for example.
Here's a couple of graphs showing performance on a fairly low spec
machine. The first is without, the second with. X axis is database
size, Y axis is documents added per second:
http://www.survex.com/~olly/dangerous.png
These graphs don't show off the gain especially well, but they're what
I have to hand. Notice how the second graph only drops away from 20
docs/sec at around 800,000 documents while the first drops below at
around 350,000.
Building with this patch also usually results in smaller databases
because blocks aren't cycled by the versioning. Running quartzcompact
on the results with and without this patch would produce a very similar
size database, but using it means you'll need less disk space to
actually build in - helpful if you're reindexing to replace an existing
database which you need to keep around while the replacement is
building.
Anyway, try it out on your own datasets and let me know how it does.
It's likely to get folded into a future release, but it needs hooking
into the API (probably as a flag passed when opening a database).
Cheers,
Olly
-------------- next part --------------
--- backends/quartz/btree.cc.orig Mon Jun 21 04:37:11 2004
+++ backends/quartz/btree.cc Mon Jun 28 14:43:36 2004
@@ -23,7 +23,7 @@
*/
#include <config.h>
-
+#define DANGEROUS
#ifdef HAVE_GLIBC
#if !defined _XOPEN_SOURCE
// Need this to get pread and pwrite with GNU libc
@@ -448,6 +448,7 @@
* the block number of another block in the B-tree structure.
*/
+#ifndef DANGEROUS
static void set_block_given_by(byte * p, int c, uint4 n)
{
c = GETD(p, c); /* c is an offset to an item */
@@ -455,6 +456,7 @@
/* c is an offset to a block number */
set_int4(p, c, n);
}
+#endif
/** block_given_by(p, c) finds the item at block address p, directory offset c,
* and returns its tag value as an integer.
@@ -494,6 +496,9 @@
{
DEBUGCALL(DB, void, "Btree::alter", "");
Assert(writable);
+#ifdef DANGEROUS
+ C[0].rewrite = true;
+#else
int j = 0;
byte * p = C[j].p;
while (true) {
@@ -512,6 +517,7 @@
p = C[j].p;
set_block_given_by(p, C[j].c, n);
}
+#endif
}
/** compare_keys(k1, k2) compares two keys pointed to by k1 and k2.