Marinos Yannikos
2011-Aug-29 12:19 UTC
[Xapian-discuss] Incremental updates and disk space ...
Hi, we've been using Xapian in production for several months now and update our (chert) databases continuously. A freshly generated index occupies only around ~35% of the disk space compared to what it becomes after a few days. This is not a huge concern (we use SSDs), but I've been wondering whether there is a way to fine-tune this (other than recreating the index frequently), so that less disk space is wasted or it degrades a little slower. Regards, Marinos
On Mon, Aug 29, 2011 at 02:19:34PM +0200, Marinos Yannikos wrote:> we've been using Xapian in production for several months now and update > our (chert) databases continuously. A freshly generated index occupies > only around ~35% of the disk space compared to what it becomes after a > few days. This is not a huge concern (we use SSDs), but I've been > wondering whether there is a way to fine-tune this (other than > recreating the index frequently), so that less disk space is wasted or > it degrades a little slower.You can use xapian-compact to make a copy of a database with free space reclaimed. But your size difference sounds unusual. In normal use, you should get ~75% block utilisation for random insertions, and close to 100% utilisation for linear updates. That doesn't take into account blocks which were used in the previous revision and are now awaiting reuse, but unless your update between (automatic or explicit) commits are changing most of the database, that shouldn't lead to only about ~35% of the space actually being used. Are you deleting a lot of documents? Or is there something else which might be unusual about your update patterns? Cheers, Olly
Possibly Parallel Threads
- floating-point issues with set_sort_by_relevance_then_value? (1.2.3, BM25 k1=0)
- hypens in words + NEAR + 3 terms + AND_MAYBE => crash
- Go (golang) bindings for Xapian?
- Cannot index with dynamic spelling data (Perl/Search::Xapian)
- getdents() with 4KB buffer - seems slow (Maildir, large inbox)