samba-bugs at samba.org
2013-Apr-18 18:19 UTC
[Bug 9814] New: --cache parameter for storing recent file data
https://bugzilla.samba.org/show_bug.cgi?id=9814 Summary: --cache parameter for storing recent file data Product: rsync Version: 3.1.0 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: core AssignedTo: wayned at samba.org ReportedBy: me at haravikk.com QAContact: rsync-qa at samba.org I know rsync is generally stateless, but caching of recent data is something that could significantly speed it up by skipping checksumming entirely. The idea is that a file's absolute path will be checksummed (not its contents) and then looked up in a folder structure of cached details, or maybe even a database. A filesystem solution can optimise by using lines in a file as final indices to the cache (so checksums for multiple files can be grouped into a single file until it gets too large), since checksums are a fixed size, and timestamps can be as well. Ideally we'd get support for at least the file-system method. Necessary options would include a threshold for discarding cache entries that are too old by when the entry was modified and/or by how many times the entry has been accessed. The latter would allow rsync to only recheck files on every second pass for example. For continuous incremental updates a cache that does a good job of balancing speed and size should allow comparisons to be performed extremely quickly, and could also be used to skip files entirely on the client-side if some kind cache comparison can be performed (so that the client can quickly decide if a file has probably already been backed up). -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-Apr-18 19:42 UTC
[Bug 9814] --cache parameter for storing recent file data
https://bugzilla.samba.org/show_bug.cgi?id=9814 --- Comment #1 from Haravikk <me at haravikk.com> 2013-04-18 19:42:14 UTC --- I just wanted to add that I mean caching in a fairly broad sense, rather than simply caching of checksums alone. My main concern is the seemingly vast difference in speed between OS X's Time Machine and rsync; I think Time Machine uses a Spotlight database to compare the source and destination, and while I know rsync is fundamentally stateless, so probably shouldn't try to hook into anything quite like that, it should really be able to use some kind of cache to accelerate its comparison. A possibility would be to allow creation of an SQLite database containing details of all files copied to a destination (and conversely another in the source to record everything that was copied from that location). These can then be used for accelerated lookups. If rsync could then hook into common systems such as Spotlight and equivalents then it could avoid having to manage a database of its own at one or both ends. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Apparently Analagous Threads
- [Bug 1098] New: Stateless packet rewriting of source/destination IPs must update IP header as well
- [Bug 9812] New: Lookahead file-list loading and comparison
- [Bug 9864] New: Allow permanent compression of destination files
- [Bug 14081] New: --copy-command option for specifying custom file copying behaviour
- [Bug 10379] New: rsync metadata files