rsync Folks, The following explanatory text is by me and the patches are by Rowan McKenzie for use by the Advanced Scientific Computing group at CSIRO. This patch builds upon the --link-dest patch by Bryant Hansen (Thanks heaps!). 1. The original patch provided an alternate behaviour for rsync when using the --link-dest option. When there are identical files in the source and link-dest areas, but a different file in the destination area, standard rsync will update the destination by copying the file from source to destination: the patch updates the destination by hard-linking from the link-dest area. Why do we want this behaviour? In our backup system which we have been running since 2007 using rsync, we recycle old backup destinations to be the target for new backups. This is an efficiency gain - we don't want to create a new area with, in several cases, millions of files, when we have an older area which is nearly up to date. (With mature backups of user areas, we typically find that our daily backups have a churn of 0.5% of files and 1% of data.) The original patch ensures we get the maximum amount of hard-linking available. However, the original patch unconditionally outputs messages for every file hard-linked under the scenario outlined above. Our modified patch makes output of the diagnostic message controlled by the -v option. 2. In addition, the patch adds one more feature and option. In our backups some time ago, we wished to avoid repeated daily backups of large files that were being appended to each day - they were the outputs of computational models. We used the --max-size parameter to skip these files: however, we did not like the lack of warning about skipped files. This patch adds another parameter, --warn, to select the output of a warning message when files bigger than the selected --max-size are skipped. The message is of the form: big_file is over max-size 3. For information: our backups are controlled from the destination of the backups (pull rather than push as Kevin Korb recently advised). We use the rsync daemon capability. The destinations of our backups are file systems subject to HSM (Hierarchical Storage Management), using SGI's Data Migration Facility (DMF). A typical command we use is the following (but I have shortened the paths and addresses). rsync --password-file=not_for_your_eyes --numeric-ids -a --stats --one-file-system --max-size=8.0GB --warn --whole-file --link-dest=previous --delete root at source_host::backups/source_dir current --password-file=not_for_your_eyes . for the daemon --numeric-ids . since the userids on the source are not always available on the destination -a . archive mode --stats . statistics --one-file-system . stops the backup of everything when backing up / --max-size=8.0GB . to skip large files --warn . NEW parameter - warn of skipped files because of --max-size --whole-file . essential when the destination is subject to HSM: otherwise, files will be recalled to use the rsync comparison algorithm --link-dest=previous . pointer to previous backup: to provide a source of files for hard-linking --delete . essential when the destination is a recycled directory, to ensure superseded files are deleted root at source_host::backups/source_dir . the source specification: username @source_host, module specification, and source directory current . the destination directory. We use an extended Tower of Hanoi scheme to manage the keeping of backups: - highly recommended for its ability to provide sensible keeping of backups matched to the likelihood of restores, and because it avoids messy management using dates and times. Regards Rob. Bell e-mail: Robert.Bell at csiro.au -- Dr Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced Scientific Computing CSIRO IM&T Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: rsync_link_dest_max_size_CSIRO-ASC.patch Type: text/x-diff Size: 4060 bytes Desc: rsync_link_dest_max_size_CSIRO-ASC.patch URL: <http://lists.samba.org/pipermail/rsync/attachments/20120824/8af71eed/attachment.patch>
Wayne Davison
2012-Aug-25 06:10 UTC
[patch] link-dest messages and max-size warnings (fwd)
On Thu, Aug 23, 2012 at 6:29 PM, Robert Bell <Robert.Bell at csiro.au> wrote:> This patch adds another parameter, --warn, to select the output of a > warning message when files bigger than the selected --max-size are skipped. >Note that 3.1.0dev has the --info option for finer-grained verbosity. If you specify --info=skip then it will tell you about any skipped files, such as those skipped via --max-size. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20120824/7f7a4a6a/attachment.html>
Apparently Analagous Threads
- rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)
- Changed attributes for a file in destination that is hard linked get propagated to --link-dest directories
- Patch for rsync --link-dest won't link even if existing file is out of date (fwd)
- access remote libvirtd fail
- rsync --link-dest option with the destination directory containing old files.