Hi, Is there a way in which rsync's -z compression (zlib) utility can be benchmarked? I'm trying to compare the compression ratio between rsync and external compression tools like gzip and bzip2. Are there any advantages to using rsync's internal compression mechanism specified with the -z option compared to solely applying external compression i.e. bzip2 to the files and invoking rsync to transfer these files without the -z option? Thanks
On Sat, May 10, 2003 at 05:58:57PM +0800, Leaw, Chern Jian wrote:> Hi, > Is there a way in which rsync's -z compression (zlib) utility can be > benchmarked?In a way. It depends on what you are benchmarking. There are many things you can measure, There is disk load, CPU load, memory usage and network load. Obviously compression will cost you CPU but how much and for what gains. The most important measure would be the network load. A packet sniffer or similar tool would be the best method. Second most important would be to measure wall-clock time.> I'm trying to compare the compression ratio between rsync and external > compression tools like gzip and bzip2.Perhaps when you are done you might compare apples to baboons. If you want to compare the effectiveness of rsync's internal compression with an extern compression you would compare against the compression in ssh or of a vpn tunnel.> Are there any advantages to using rsync's internal compression mechanism > specified with the -z option compared to solely applying external > compression i.e. bzip2 to the files and invoking rsync to transfer these > files without the -z option?The point or rsync is to synchronise. It can synchronise compressed files but the compression tends to defeat the efficiency of rsync and depending on the actual compression method and file content may result in increased network load. There is an rsyncable patch available for gzip that reduces the adverse affect compression has on rsync. If your question is whether it is faster or less network load use rsync -z or another method to copy to an empty destination scp -C or a tar/cpio pipeline with compression will almost certainly preform better. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
On Sat, 2003-05-10 at 19:58, Leaw, Chern Jian wrote:> Hi, > Is there a way in which rsync's -z compression (zlib) utility can be > benchmarked? > > I'm trying to compare the compression ratio between rsync and external > compression tools like gzip and bzip2. > > Are there any advantages to using rsync's internal compression mechanism > specified with the -z option compared to solely applying external > compression i.e. bzip2 to the files and invoking rsync to transfer these > files without the -z option?I'm assuming here you are talking about using librsyncs -z vs running librsync without it through a compressed pipe, and are aware that rsync does delta-compression to updated a basis file in both cases. rsync _should_ be able to do better with -z because it uses "context-compression" by "priming" the compressor with hits and discarding the compressed output. This means the compressor and de-compressor see the whole file, even though only the compressed miss data is transmitted. my experiments with pysync confirmed that this does make a measurable difference (see the comments with pysync) on real world compressible data. A similar benefit could be achieved with self-referencing deltas, as supported by the vcdiff format (soon to be) used by xdelta. -- ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ----------------------------------------------------------------
Donavan, Yes, I'm referring to librsync's -z compression vs running and external compression tool on the files to be transferred. When you mentioned the following: "rsync _should_ be able to do better with -z because it uses "context-compression" by "priming" the compressor with hits and discarding the compressed output. This means the compressor and de-compressor see the whole file, even though only the compressed miss data is transmitted." Could you kindly elaborate further on context-compression? What do you mean when you mentioned by "primming the compressor with hits and discarding compressed output." ? I'd like to gain a further understanding of the this concept. Thanks -----Original Message----- From: Donovan Baarda [mailto:abo@minkirri.apana.org.au] Sent: Sunday, May 11, 2003 1:45 PM To: Leaw, Chern Jian Cc: rsync-request@lists.samba.org; rsync@lists.samba.org Subject: Re: benchmarking rsync's -z compression utility On Sat, 2003-05-10 at 19:58, Leaw, Chern Jian wrote:> Hi, > Is there a way in which rsync's -z compression (zlib) utility can be > benchmarked? > > I'm trying to compare the compression ratio between rsync and external > compression tools like gzip and bzip2. > > Are there any advantages to using rsync's internal compression mechanism > specified with the -z option compared to solely applying external > compression i.e. bzip2 to the files and invoking rsync to transfer these > files without the -z option?I'm assuming here you are talking about using librsyncs -z vs running librsync without it through a compressed pipe, and are aware that rsync does delta-compression to updated a basis file in both cases. rsync _should_ be able to do better with -z because it uses "context-compression" by "priming" the compressor with hits and discarding the compressed output. This means the compressor and de-compressor see the whole file, even though only the compressed miss data is transmitted. my experiments with pysync confirmed that this does make a measurable difference (see the comments with pysync) on real world compressible data. A similar benefit could be achieved with self-referencing deltas, as supported by the vcdiff format (soon to be) used by xdelta. -- ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ----------------------------------------------------------------