samba-bugs at samba.org
2017-Feb-07 12:19 UTC
[Bug 12570] New: Problems with --checksum --existing
https://bugzilla.samba.org/show_bug.cgi?id=12570 Bug ID: 12570 Summary: Problems with --checksum --existing Product: rsync Version: 3.1.1 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: wayned at samba.org Reporter: atom at smasher.org QA Contact: rsync-qa at samba.org Problem: I've got an sd-card with some movies, a few of which are corrupted files. I want to copy only the files that don't match the good files. command: rsync --checksum --existing -vhriP /movies/ /media/128-SD/Movies/ The problem here is that *all* files in "/movies/" are hashed before anything else happens. This can be verified with lsof: "lsof +D /movies". I've got <100GB in "/media/128-SD/Movies/". I've got >1.5TB in "/movies/", and hashing all of those files is just a huge waste of time and system resources. When "--existing" and "--checksum" are both used, the algorithm should first make a list of candidate files, then start hashing. It should *not* start hashing everything on the send-side and then figure out which files might be needed. Workaround for me: diff -r /movies/ /media/128-SD/Movies/ | grep differ | awk '{print "pv " $3" > "$5}' | sh nb, that workaround requires "pv" and only works with file-names that do not contain spaces, but for me it's a quick and easy way to see progress while files are being copied. "cp" would work fine in place of "pv". On my system, that workaround saved my about 1-2 days of hashing, and completed in less than an hour. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Feb-07 14:32 UTC
[Bug 12570] Problems with --checksum --existing
https://bugzilla.samba.org/show_bug.cgi?id=12570 --- Comment #1 from Kevin Korb <rsync at sanitarium.net> --- Unfortunately rync's --checksum is just that dumb. It checksums *EVERYTHING* on the source and the target before it does anything else. Since --checksum is almost always the wrong thing to do nobody seems to be willing to add basic intelligence to it. Unfortunately, what you are trying to do is one of those few instances when --checksum is the right thing to use. So, that is just the way it works. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2019-Nov-03 12:37 UTC
[Bug 12570] Problems with --checksum --existing
https://bugzilla.samba.org/show_bug.cgi?id=12570 --- Comment #2 from Haravikk <samba at haravikk.com> --- I was about to post on basically the same issue, but found this; I use rsync to do a lot of incremental backups where ZFS or similar isn't an option (not that common, but still comes up now and then). To guarantee correctness I like to run a periodic consistency check with --checksum to be certain that none of the files have changed at rest on the receiver, just like how I scrub a ZFS pool from time to time. Problem is that rsync's --checksum mode is insanely slow when done for a large amount of files, much slower than it should be, even allowing for a slow sender or receiver. I had always assumed that rsync at each end just set about gathering metadata in the background, while communicating, "I have X with checksum Y" -> "I don't, send it" or such, but this doesn't appear to be the case with --checksum, as it can take hours before anything even *begins* sending, let alone the actual time to finish. It seems a lot like the incremental file list behaviour of modern rsync is being disabled when --checksum mode is enabled, but is there any good reason why that should be the case? I can't think of any reason why it should be different, as a checksum ultimately is just a value to be compared, just like a file-size and/or timestamp, it just takes a bit longer to generate each one. -- You are receiving this mail because: You are the QA Contact for the bug.
Reasonably Related Threads
- [Bug 10379] New: rsync metadata files
- Checksum property change does not change pre-existing data - right?
- [Bug 9812] New: Lookahead file-list loading and comparison
- [Bug 10051] New: Improved long file-name handling
- [Bug 14371] New: Combined Exclude & Protect Filter Type