Robert Bell
2015-Apr-07 03:14 UTC
Patch for rsync --link-dest won't link even if existing file is out of date (fwd)
Folks, We faced a similar situation to that which Ken described - we recycle backup directories, for good reason. There is a patch to solve the problem. Our systems administrator provided the following description of the patches we use: ===========================================================================1. rsync_link_dest improvement by Bryant Hansen Normally, existing files in destination are never updated from link-dest but are transferred over the wire. This patch changes that behaviour to use link-dest instead, which is a major performance enhancement in our environment. 2. Warnings for --max-size ignored files are displayed if -w/--warning is specified by Rowan McKenzie (CSIRO SC) Warnings for -max-size ignored files are displayed if -w/-warning is specified. Normally, -max-size causes files to be silently ignored! 3. Only output '=>' notifications when -v/--verbose specified by Rowan McKenzie (CSIRO SC) Only output '#' notifications when -v/-verbose specified (it's a patch to the rsync_link_dest_from_bryant patch). This reduces clutter by suppressing a large class of false positives.: =========================================================================== Hope you can find these. (All we need now for rsync perfection for our backups is a solution to the problem of metadata changes being propagated across all directories for hard-linked files - we would rather new copies be made than lose the old metadata.) Regards Rob. Dr Robert C. Bell HPC National Partnerships | Scientific Computing Information Management and Technology CSIRO T +61 3 9669 8102 Alt +61 3 8601 3810 Mob +61 428 108 333 Robert.Bell at csiro.au<mailto:Robert.Bell at csiro.au> | www.csiro.au | wiki.csiro.au/display/ASC/ Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. ---------- Forwarded message ---------- Date: Mon, 6 Apr 2015 01:51:21 -0400 From: Ken Chase <rsync-list-m829 at sizone.org> To: rsync at lists.samba.org Subject: rsync --link-dest won't link even if existing file is out of date Feature request: allow --link-dest dir to be linked to even if file exists in target. This statement from the man page is adhered to too strongly IMHO: "This option works best when copying into an empty destination hierarchy, as rsync treats existing files as definitive (so it never looks in the link-dest dirs when a destination file already exists)". I was suprised by this behaviour as generally the scheme is to be efficient/save space with rsync. When the file is out of date but exists in the --l-d target, it would be great if it could be removed and linked. If an option was supplied to request this behaviour, I'd actually throw some money at making it happen. (And a further option to retain a copy if inode permissions/ownership would otherwise be changed.) Reasoning: I backup many servers with --link-dest that have filesystems of 10+M files on them. I do not delete old backups - which take 60min per tree or more just so rsync can recreate them all in an empty target dir when <1% of files change per day (takes 3-5 hrs per backup!). Instead, I cycle them in with mv $olddate $today then rsync --del --link-dest over them - takes 30-60 min depending. (Yes, some malleability of permissions risk there, mostly interested in contents tho). Problem is, if a file exists AT ALL, even out of date, a new copy is put overtop of it per the above man page decree. Thus much more disk space is used. Running this scheme with moving old backups to be written overtop of accumulates many copies of the exact same file over time. Running pax -rpl over the copies before rsyncing to them works (and saves much space!), but takes a very long time as it traverses and compares 2 large backup trees thrashing the same device (in the order of 3-5x the rsync's time, 3-5 hrs for pax - hardlink(1) is far worse, I suspect a some non-linear algorithm therein - it ran 3-5x slower than pax again). I have detailed an example of this scenario at http://unix.stackexchange.com/questions/193308/rsyncs-link-dest-option-does-not-link-identical-files-if-an-old-file-exists which also indicates --delete-before and --whole-file do not help at all. /kc -- Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.
Possibly Parallel Threads
- rsync --link-dest won't link even if existing file is out of date
- rsync --link-dest won't link even if existing file is out of date
- rsync --link-dest won't link even if existing file is out of date
- rsync --link-dest won't link even if existing file is out of date
- [patch] link-dest messages and max-size warnings (fwd)