Robert Bell
2013-Jan-22 03:35 UTC
rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)B
Paul Wayne, Kevin, Teodor and others, Thanks for your contributions in response to my postings. Paul: I was very imprecise if not plain wrong in my description. :-( Thanks for explaining what really happens.> "Rsync will not update an existing file in-place unless you use the > --inplace option. So --whole-file is irrelevant for this. > Rsync (without --inplace) will always create a new (temporary) file, > using the existing data (without --whole-file) to enable the delta diff > speedup algorithm. Once the temp file is successfully created, it's > renamed to the original name, deleting the existing link. So any > hardlinked data will remain untouched."Since my posting to the rsync digest last week, I've needed to think a lot about rsync behaviour with hard-links, and have been doing some tests. It's been good to have an outbreak of postings on these issues, with a re-visiting of Bug 5644, and Wayne's postings about features in the upcoming version 3.1.0. (At our site, we use a patched version of rsync which links a file from the link-dest directory rather copying from source when a file is identical in the source and link-dest directory, but exists and is different in the destination.) I was not aware of the issue in the case where the unchanged_file() test is passed, but not the unchanged_attrs() test, and the potential for over-writing the attributes in not just the destination, but for all hard-linked files. This means that recycling directories, which as Teodor Milkov noted:> "Such a behaviour (unlink changed files and then hard link to dest dir) > would be very handy, because rotating large directory trees (e.g. 10 > milion files, 10k files changed) is sooo much more efficient than > deleting them and then repopulating from scratch."is an issue as Wayne noted:> "A pre-existing hard-linked copy of the files causes rsync to > just change the attributes on the file in-place (without breaking the > hard-link). This can be a minor point for some people (if historical > permissions/ACLs/xattrs don't need to be accurate), but could be a deal > breaker for some."I can see the need for another rsync option here to allow users to select the making of a fresh copy of the file in this case. That would restore the behaviour I implicitly assumed we had, but didn't. I've updated the documentation for our backups, and prepared a note for users. I'm also thinking about ways around this issue, none of which are particularly appealing: - drop the recycling of old directories (parameterised in our set-up) - break the linking at regular intervals (parameterised in our set-up) - do a dry run to identify changed files, delete those on the destination, and then do a non-dry run (there are timing issues here, but there always will be for a non-quiet filesystem). Thanks again Regards Rob. Bell e-mail: Robert.Bell at csiro.au -- Dr Robert C. Bell, BSc (Hons) PhD Technical Services Manager Advanced Scientific Computing CSIRO IM&T Phone: +61 3 9669 8102 | Mobile: +61 428 108 333 | CSIRO 93 3810 Robert.Bell at csiro.au | http://www.csiro.au/ | http://www.hpsc.csiro.au/ Addresses: Street: CSIRO ASC Level 11, 700 Collins Street, Docklands Vic 3008, Australia Postal: CSIRO ASC Level 11, GPO Box 1289, Melbourne Vic 3001, Australia Please see earlier postings for the disclaimer.
Maybe Matching Threads
- Changed attributes for a file in destination that is hard linked get propagated to --link-dest directories
- rsync - using a --files-from list to cut out scanning. How to handle deletions? (fwd)
- rsync - using a --files-from list to cut out scanning. How to handle deletions?
- rsync Digest, Vol 116, Issue 19
- [Bug 11523] New: Request: Add option to unlink hard links when permissions change