Tripp Lilley
2002-Oct-24 08:41 UTC
Feature Request: break hardlinks before metadata changes
[This email is either empty or too large to be displayed at this time]
Tripp Lilley
2002-Oct-24 09:20 UTC
Feature Request: break hardlinks before metadata changes
On Thu, 24 Oct 2002, Tripp Lilley wrote:> I'd personally like to see an option to force rsync to break-and-copy any > hardlink that pointed outside of the destination tree before doing -any- > changes, even metadata.Nevermind. I've gotten the latest CVS and found J.W. Schultz' --link-dest option, which does in one nice package all of the hardlink creation from Rubel's scripts, and -does- honor metadata changes by not hardlinking. Very nice. Sorry to have bothered the list. - t.
On Thu, Oct 24, 2002 at 04:40:48AM -0400, Tripp Lilley wrote:> > >From TODO: > > We can also have the case where there are links to a file that are > not in the tree being transferred. There's nothing we can do about > that. Because we rename the destination into place after writing, > any hardlinks to the old file are always going to be orphaned. In > fact that is almost necessary because otherwise we'd get really > confused if we were generating checksums for one name of a file and > modifying another. > > > >From Mike Rubel's excellent incremental backups HOWTO: > <http://www.mikerubel.org/computers/rsync_snapshots/#Bugs> > > As-written, the snapshot system above does not properly maintain old > ownerships/ permissions; if a file's ownership or permissions are > changed in place, then the new ownership/permissions will apply to older > snapshots as well. This is because rsync does not unlink files prior to > changing them if the only changes are ownership/permission. Thanks to > J.W. Schultz for pointing this out. This is not a problem for me, but > slightly more complicated workarounds are possible > > > I'd personally like to see an option to force rsync to break-and-copy any > hardlink that pointed outside of the destination tree before doing -any- > changes, even metadata. > > I know that this "breaks" standard hardlink semantics, but it's a > desirable breakage for building these nice incremental backup systems :) > I'm willing to eat the space taken by duplicating the entire file, since > the obvious alternative (using LVM snapshots to preserve -everything- > about the previous versions) has its own critical drawback (the danger of > running out of space on the snapshot limits how long a given snapshot can > stick around on the system).Don't forget the performance penalty of LVM snapshots. Every changed block has to be copied.> It looks like I should make the change in generator.c : recv_generator, > but I'm not quite sure of the repercussions of "copying" the various sorts > of files the link target might be. Actually, I guess I'm unfamiliar enough > with hardlinks to not even know what I can hardlink to :)You might be able to piggy-back on --link-dest. I dealt with this issue in the --link-dest patch by causing skip_file() to treat files with meta-data change as though having content change. This does result in a bit more network load and breaking the links will cause the snapshot images to inflate if someone does a chmod|chown|chgrp -R but i consider the issue of changing earlier images to be an overriding concern. There aren't too many circumstances where this is going to be an issue outside of linked backup images. If that is what you are doing, take a look at using --link-dest or even try dirvish (http://www.pegasys.ws/dirvish) which uses --link-dest to get this right. It has been a while since i seriously looked at this particular bit of code and i'm not sure i like the idea of adding this feature but you could try a modification in generator.c something like: /* choose whether to skip a particular file */ static int skip_file(char *fname, struct file_struct *file, STRUCT_STAT *st) { if (st->st_size != file->length) { return 0; } - if (link_dest) { + if (link_dest || preserve_outer_links) { if((st->st_mode & ~_S_IFMT) != (file->mode & ~_S_IFMT)) { return 0; } if (st->st_uid != file->uid || st->st_gid != file->gid) { return 0; } } Where preserve_outer_links (example name) is a boolean set from the command-line. There of course is no distinction made here whether the link has any references outside of the synchronized tree. Identifying if there were a link outside the tree would require deferring all meta-data changes until all scanning had been completed and that would constitute a significant change from present code where set_perms() is called from recv_generator() if a file hasn't changed. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt