samba-bugs at samba.org
2021-Aug-17 09:22 UTC
[Bug 14798] New: Metadata traffic --- uncompressed with -z, interaction with --bwlimit and ssh compression
https://bugzilla.samba.org/show_bug.cgi?id=14798 Bug ID: 14798 Summary: Metadata traffic --- uncompressed with -z, interaction with --bwlimit and ssh compression Product: rsync Version: 3.1.3 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P5 Component: core Assignee: wayne at opencoder.net Reporter: zero at smallinteger.com QA Contact: rsync-qa at samba.org Target Milestone: --- Consider the case where rsync is tasked to synchronize a large file set in which there are few changes. Anecdotal evidence (duckduckgo search) suggests most of the network traffic will be spent exchanging file metadata, rather than file content, as intended. The same anecdotal evidence suggests this "file list" is not exchanged in compressed form between rsync's endpoints, even when using the -z switch. This seems accurate: setting up a suitable experiment shows ssh compression reduces overall bandwidth usage by roughly 2x in these cases. This seems an opportunity for improvement. The benefits would be compounded when using --bwlimit. In this case, disabling ssh compression results in traffic that respects the requested shape. However, this traffic is measured at the rsync endpoints. Consequently, rsync will not use the available bandwidth effectively, precisely because in this use case there are very few file changes in the file set (which is the point of using rsync). Note that since ssh compression is unpredictable, adequately adjusting --bwlimit for maximum efficiency is impossible. Thus, bandwidth usage will be optimal without -z (but with redundant traffic without ssh or rsync compression), or suboptimal with or without -z and --bwlimit (due to ssh compressing file metadata without rsync realizing). In these cases, the time required for rsync to complete the task remains unchanged regardless of the form of compression. Would it be possible to rsync's -z switch to set up the equivalent of two compressed streams, one for file data, another for file metadata, which are then multiplexed over the wire? In that way, ssh compression would be entirely unnecessary, and --bwlimit would still result in maximum efficiency even when most traffic is file metadata. Having rsync compress the file list is likely to result in better compression than ssh could achieve because the shape of the file metadata will be known to rsync. I could not find previous bug reports on this specific issue in the bug database --- I searched for bugs related to -z and --bwlimit, and I also searched through the release notes in case this (or an equivalent) enhancement has been applied recently. -- You are receiving this mail because: You are the QA Contact for the bug.