samba-bugs at samba.org
2013-Apr-18  18:19 UTC
[Bug 9814] New: --cache parameter for storing recent file data
https://bugzilla.samba.org/show_bug.cgi?id=9814
           Summary: --cache parameter for storing recent file data
           Product: rsync
           Version: 3.1.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: me at haravikk.com
         QAContact: rsync-qa at samba.org
I know rsync is generally stateless, but caching of recent data is something
that could significantly speed it up by skipping checksumming entirely.
The idea is that a file's absolute path will be checksummed (not its
contents)
and then looked up in a folder structure of cached details, or maybe even a
database. A filesystem solution can optimise by using lines in a file as final
indices to the cache (so checksums for multiple files can be grouped into a
single file until it gets too large), since checksums are a fixed size, and
timestamps can be as well.
Ideally we'd get support for at least the file-system method.
Necessary options would include a threshold for discarding cache entries that
are too old by when the entry was modified and/or by how many times the entry
has been accessed. The latter would allow rsync to only recheck files on every
second pass for example.
For continuous incremental updates a cache that does a good job of balancing
speed and size should allow comparisons to be performed extremely quickly, and
could also be used to skip files entirely on the client-side if some kind cache
comparison can be performed (so that the client can quickly decide if a file
has probably already been backed up).
-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
samba-bugs at samba.org
2013-Apr-18  19:42 UTC
[Bug 9814] --cache parameter for storing recent file data
https://bugzilla.samba.org/show_bug.cgi?id=9814 --- Comment #1 from Haravikk <me at haravikk.com> 2013-04-18 19:42:14 UTC --- I just wanted to add that I mean caching in a fairly broad sense, rather than simply caching of checksums alone. My main concern is the seemingly vast difference in speed between OS X's Time Machine and rsync; I think Time Machine uses a Spotlight database to compare the source and destination, and while I know rsync is fundamentally stateless, so probably shouldn't try to hook into anything quite like that, it should really be able to use some kind of cache to accelerate its comparison. A possibility would be to allow creation of an SQLite database containing details of all files copied to a destination (and conversely another in the source to record everything that was copied from that location). These can then be used for accelerated lookups. If rsync could then hook into common systems such as Spotlight and equivalents then it could avoid having to manage a database of its own at one or both ends. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Possibly Parallel Threads
- [Bug 1098] New: Stateless packet rewriting of source/destination IPs must update IP header as well
- [Bug 9812] New: Lookahead file-list loading and comparison
- [Bug 9864] New: Allow permanent compression of destination files
- [Bug 14081] New: --copy-command option for specifying custom file copying behaviour
- [Bug 10379] New: rsync metadata files