On Mon, Dec 01, 2003 at 07:33:43PM +0100, Andrzej Filip
wrote:> Is there any chance for rdiff ?
>
> I need to frequently synchronize big text file (60MB+) undertaking small
> changes and I am interested in differences between the subsequent versions
> [DNS RBL data in dnsbl format, 1E6+ lines of text, new version every 20m,
> on average 50 new entries (lines) in every synchronization]
>
> I would like to get (small) diff file as result of rsync session and apply
> it to the file myself.
rdiff sounds like a good solution for what you want.
However, there might be a better solution if your data has certain
patterns to it... for example if the changes are just lines appended,
then a simple 'tail -f' type approach might achieve what you want. For
text data, things like 'diff oldfile newfile | gzip -9' can be
surprisingly effective, and will have a much lower CPU overhead than
rdiff.
However, one of the cool things about rdiff is it calculates delta's
from a signature, so you don't have to keep a whole copy of
'oldfile',
just a signature of it. For example;
$ rdiff signature bigfile bigfile.sig
[wait 20 mins while bigfile gets modified]
$ rdiff delta bigfile.sig bigfile bigfile.delta
Note however, that rdiff will not be happy (and neither will just
about anything except perhaps 'tail -f') if the file is changing
underneath it while it operates on them. If these are live files, you
will need to either suspend whatever process modifies them while rdiff
is working, or use a filsystem 'snapshot' (see LVM).
For synchronising live files, you are probably better off looking at
some sort of block-device level syncronisation tool (kind of like RAID
mirroring with a remote network block device).
--
----------------------------------------------------------------
Donovan Baarda http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------