samba-bugs at samba.org
2014-Apr-14 20:10 UTC
[Bug 10552] New: Sender checksum calculation significantly slower with compression enabled
https://bugzilla.samba.org/show_bug.cgi?id=10552
Summary: Sender checksum calculation significantly slower with
compression enabled
Product: rsync
Version: 3.1.1
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P5
Component: core
AssignedTo: wayned at samba.org
ReportedBy: dougmiles at cox.net
QAContact: rsync-qa at samba.org
I've noticed that with the -z option enabled, comparing two files that are
identical (or nearly so) but have different modification times takes
significantly longer than without the -z option. It looks like the entire file
is compressed as the checksum is calculated even when no data needs to be
transmitted to the receiver.
To replicate:
Create test folders/files (using incompressible data to maximize effect):
mkdir a b
dd if=/dev/urandom of=a/a.tst bs=1M count=250
cp a/a.tst b/
run rsync without compression:
touch a/a.tst
time rsync -av a/ b
run rsync with compression:
touch a/a.tst
time rsync -avz a/ b
The second time with the -z option will take significantly longer, even though
the source and destination files are identical apart from the modification
times. I've also found that the latter uses much more CPU, but only on one
of
the two processes- the sender process I believe.
Even when I added --skip-compress=tst, it made rsync much faster than -z alone,
but it still took about 30% longer than omitting -z entirely.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
samba-bugs at samba.org
2014-Apr-17 12:34 UTC
[Bug 10552] Sender checksum calculation significantly slower with compression enabled
https://bugzilla.samba.org/show_bug.cgi?id=10552 --- Comment #1 from John Pierman <haqthat at gmail.com> 2014-04-17 12:34:03 UTC --- Confirmed: Without -z avg over 3 runs (real 0m5.998s) With -z avg over 3 runs (real 0m8.490s) On Mon, Apr 14, 2014 at 4:10 PM, <samba-bugs at samba.org> wrote:> https://bugzilla.samba.org/show_bug.cgi?id=10552 > > Summary: Sender checksum calculation significantly slower with > compression enabled > Product: rsync > Version: 3.1.1 > Platform: All > OS/Version: All > Status: NEW > Severity: normal > Priority: P5 > Component: core > AssignedTo: wayned at samba.org > ReportedBy: dougmiles at cox.net > QAContact: rsync-qa at samba.org > > > I've noticed that with the -z option enabled, comparing two files that are > identical (or nearly so) but have different modification times takes > significantly longer than without the -z option. It looks like the entire > file > is compressed as the checksum is calculated even when no data needs to be > transmitted to the receiver. > > To replicate: > > Create test folders/files (using incompressible data to maximize effect): > mkdir a b > dd if=/dev/urandom of=a/a.tst bs=1M count=250 > cp a/a.tst b/ > > run rsync without compression: > touch a/a.tst > time rsync -av a/ b > > run rsync with compression: > touch a/a.tst > time rsync -avz a/ b > > The second time with the -z option will take significantly longer, even > though > the source and destination files are identical apart from the modification > times. I've also found that the latter uses much more CPU, but only on > one of > the two processes- the sender process I believe. > > Even when I added --skip-compress=tst, it made rsync much faster than -z > alone, > but it still took about 30% longer than omitting -z entirely. > > -- > Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the QA contact for the bug. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html >-- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2014-Apr-17 15:26 UTC
[Bug 10552] Sender checksum calculation significantly slower with compression enabled
https://bugzilla.samba.org/show_bug.cgi?id=10552 --- Comment #2 from dougmiles at cox.net 2014-04-17 15:26:35 UTC --- Just adding that when using fast storage such as an SSD and/or a slow processor like in some NAS boxes, the gap can widen quite a lot. Here are my times on my 2.6GHz Core i7 Ivy Bridge Mac Mini with an SSD, for example: Without -z: real 0m1.789s user 0m0.968s sys 0m0.713s With -z: real 0m10.549s user 0m10.039s sys 0m0.967s -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2014-Apr-19 16:57 UTC
[Bug 10552] Sender checksum calculation significantly slower with compression enabled
https://bugzilla.samba.org/show_bug.cgi?id=10552
Wayne Davison <wayned at samba.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Severity|normal |enhancement
--- Comment #3 from Wayne Davison <wayned at samba.org> 2014-04-19
16:57:45 UTC ---
Yes, this is part of the way that rsync trades CPU (and disk I/O) to reduce
transfer I/O. When compressing, both sides of the connection "prime the
pump"
for a compressed file's transfer by including matching data in the
compression
stream. This ensures that by the time a difference is found that the data will
compress more optimally. It can't know in advance that there will be no
differences in the whole file, since by the time it finds that out the transfer
is done. If this is causing you problems, you might try --checksum, but that
can be slower too if there are a lot of unchanged files in the transfer that
match in size & mtime, though you can improve the speed of --checksum though
one of the rsync patches that caches checksum data.
There has been some thought to disable the shared-data part of the compression,
since it complicates the compression-lib usage and (as you saw) is sometimes
wasteful. I'm marking this as an enhancement request to add such a
compression
mode.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
samba-bugs at samba.org
2014-Apr-19 19:29 UTC
[Bug 10552] Add optionion compress mode that skips including unchanged-data in the compression stream
https://bugzilla.samba.org/show_bug.cgi?id=10552
Wayne Davison <wayned at samba.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #4 from Wayne Davison <wayned at samba.org> 2014-04-19
19:29:35 UTC ---
I've committed new-style compression that avoids compressing matching file
data. If both sides are at least 3.1.1, then using -zz will give you the new
compress method.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.