samba-bugs at samba.org
2013-Apr-18 17:25 UTC
[Bug 9812] New: Lookahead file-list loading and comparison
https://bugzilla.samba.org/show_bug.cgi?id=9812 Summary: Lookahead file-list loading and comparison Product: rsync Version: 3.1.0 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: core AssignedTo: wayned at samba.org ReportedBy: me at haravikk.com QAContact: rsync-qa at samba.org I've been using rsync for various things for some time now, but only recently have I properly begun using it with a remote server, in my particular case to create redundant copies of very large backup structures (almost a million files, ~3tb in total) which of course is trying for most software to manage. However, the main problem that I've noticed with rsync is that it takes a *very* long time to detect changes that can start being synced to the server, even with incremental file lists, presumably a result of having to build a list of current X current files, send to the other server and then await a response. I think the best way to resolve this is to provide more look-ahead on the file list exchanges. Basically what would happen is that once the client has sent the parameters to the receiver, both will start loading all matching files in order to get timestamps/checksums ready for comparison. As soon as the first file-list segment is ready the client will send it. Hopefully by the time it does the server already has a full set of file-data in the same basic order to compare against, allowing it to rapidly detect changed, deleted or new files. This process can also be optimised, such that if the file data for an entire directory is loaded before the next segment/comparison is required, then it will be condensed into a timestamp/checksum for the directory only. In this way the client can sent any available directory times/checksums for the receiver for rapid comparison; if the receiver's directory isn't matched then it will request the file-data from the client, which should still have it cached. The whole mechanism would operate within a reasonable buffer, to conserve memory but while holding onto enough file-data at each end for quick sending/comparison as required. Basically the idea is to get each end of the connection doing as much work as it can without actually having to communicate with each other, so that when communication does occur it is as optimised as possible. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-Apr-22 11:19 UTC
[Bug 9812] Lookahead file-list loading and comparison
https://bugzilla.samba.org/show_bug.cgi?id=9812 --- Comment #1 from Paul Slootman <paul at debian.org> 2013-04-22 11:19:23 UTC ---> Basically what would happen is that once the client has sentthe parameters to the receiver, both will start loading all matching files in order to get timestamps/checksums ready for comparison. As soon as the first file-list segment is ready the client will send it. Hopefully by the time it does the server already has a full set of file-data in the same basic order to compare against, allowing it to rapidly detect changed, deleted or new files. Didn't you just basically describe the incremental file transfer already implemented by rsync? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-Apr-22 11:34 UTC
[Bug 9812] Lookahead file-list loading and comparison
https://bugzilla.samba.org/show_bug.cgi?id=9812 --- Comment #2 from Haravikk <me at haravikk.com> 2013-04-22 11:34:40 UTC --- If that's the case then it doesn't seem to be very effective if it is working that way already; what about ahead of time comparisons? For example, if the first file in the transfer is huge, then rsync should be looking at all the files after it in order to find out what needs to be transferred next, possibly generating checksums in advance. Basically, if the sender is currently sending a file, then does the receiver continue sending file data in return for the sender to process in advance? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Reasonably Related Threads
- [Bug 9814] New: --cache parameter for storing recent file data
- [Bug 10379] New: rsync metadata files
- [Bug 10380] New: Non-Nested Folder Optimisation
- [Bug 14081] New: --copy-command option for specifying custom file copying behaviour
- [Bug 9864] New: Allow permanent compression of destination files