On Tue, Jun 26, 2007 at 05:23:13AM +0000, David wrote:> But, every week we get new data, and most documents will have to be
> Xapian::WritableDatabase::replace_document()'d. What type of effect
would this
> have?
Note that if most documents have changed, it'll probably be
significantly faster to just rebuild the database if you have a copy of
the current data rather than just the delta. The case of appending
lots of documents to a database is particularly well optimised, and
probably inherently faster anyway.
But if you're happy with the update speed, there's no problem with
replacing lots of documents.
> Since the majority of the database will, in effect, be "replaced"
on a weekly
> basis, how does the database re-organize itself?
Blocks are kept between 50% and 100% full, except we don't currently
coalesce blocks when deleting. In reality, that doesn't seem to matter
- the next big update will fill most of them up again, and totally empty
blocks are released for reuse.
> Would I have to do some sort of compacting?
You can run xapian-compact to eliminate any currently unused blocks and
fill blocks fuller (typically 95-100% full with the default options).
Until the next update the database is also especially fast to search.
If you plan to update further, 95-100% full blocks mean the next few
updates will cause a lot of block-splitting so "xapian-compact -n"
might
be a better option, as this stops it trying to cram all the blocks so
full. I've not profiled if this actually helps however. It would be
interesting to know.
Cheers,
Olly