On Mon, Mar 10, 2014 at 01:55:58PM -0700, Ryan Cross
wrote:> I am running xapian-core 1.2.12. I have just begun experimenting with
> replication and noticed that the change set files being created are
> quite large. For example, updating the index with a small document,
> ~50 terms, results in an 11MB change set file. Is this correct? What
> is in these files? The total index size is 34GB.
The changeset file contains any blocks which changed, plus the new base
files, so a changeset containing one small document update will be
disproportionately large. If you look at the changeset for a larger
update, it should be a more reasonable size.
But the current replication protocol is built assuming that there's
a fast network between the master and slaves, and aims to make updating
the database on the slaves efficient (we send whole blocks, so the slave
can hopefully write the to disk without having to first read the
existing data from disk - obviously this assumes the blocks are suitably
aligned, and if RAID is in use, that an appropriate RAID setup has been
chosen).
There's code around to compress blocks in the changesets with zlib
(which reduces disk usage and disk reads and writes on the master, and
network bandwidth used, at the expense of some extra CPU on both sides.
This was in brass on trunk, but I've been working on some changes there
and right now changesets aren't compressed. But if you're interested in
backporting this to get compressed changesets for chert in 1.2, I can
point you at a suitable version to look at.
Looking forward, the bitmaps to track free blocks which make the base
files relatively large have been replaced by free lists stored in unused
blocks, so we don't need to send the base files each time, which will
help the case of a small update. And when I put compression back in, my
plan is to compress the changeset file as one rather than
block-by-block, which should give better compression.
Cheers,
Olly