Chris Green
2020-Dec-11 11:53 UTC
Is there any way to restore/create hardlinks lost in incremental backups?
Paul Slootman via rsync <rsync at lists.samba.org> wrote:> On Thu 10 Dec 2020, Chris Green via rsync wrote: > > > > Occasionally, because I've moved things around or because I've done > > something else that breaks things, the hard links aren't created as > > they should be and I get a very space consuming backup increment. > > > > Is there any easy way that one can restore hard links in the *middle* > > of a series? For example say I have:- > > > > day1/pictures > > day2/pictures > > day3/pictures > > day4/pictures > > day5/pictures > > > > and I notice that day4/pictures is using as much space as > > day1/pictures but all the others are relatively small, i.e. > > day2 day3 and day5 have correctly hard linked to the previous day but > > day4 hasn't. > > > > It needs a tool that can scan day4, check a file is identical with the > > one in day3 then hardlink it without losing the link from day5. > > If you have these files that are hardlinked: > > day1/pictures/1.jpg > day2/pictures/1.jpg > day3/pictures/1.jpg > > And these are hardlinked, but to a different inode: > > day4/pictures/1.jpg > day5/pictures/1.jpg > > then there is no way of linking the second group to the first in one > step; you will have to individually link day3/pictures/1.jpg to > day4/pictures/1.jpg and then day3/pictures/1.jpg (or > day4/pictures/1.jpg) to day5/pictures/1.jpg. > > It's not like a group of directory entries that are hardlinked to one > inode are some sort of actual group; they just happen to be directory > entries that point to the same inode number. There is no other relation > between those directory entries. > > So you will have to incrementally process each next day against the > previous day. >Yes, that's what I have done, wrote a trivial[ish] script that copied all the backups to a new destination sequentially (using --link-dest) and then removed the original tree, having checked the new backups were OK of course. Fortunately I have lots of spare space on the backup system at the moment having just upgraded it with a new 8Tb drive, so duplicating the whole backup wasn't an issue (though rather slow because it was from and to the same drive).> > If I make a significant change in such a directory structure (e.g. > renaming a directory) I try to remember to do the same thing on the > backup which some say is wrong, but it saves a lot of space, like you > discovered :) >Yes, I've sometimes done that. -- Chris Green ?
Guillaume Outters
2020-Dec-11 14:30 UTC
Is there any way to restore/create hardlinks lost in incremental backups?
On 2020-12-11 12:53, Chris Green wrote?:> [?] wrote a trivial[ish] script that copied > all the backups to a new destination sequentially (using --link-dest) > and then removed the original tree, having checked the new backups > were OK of course.With the same cause as yours, I once worked out exactly the same solution. But then, having to automate it, I worked a bit more on it, and ended up having a shell script that: - recursively listed files as "file size - inode - path" - with sort and awk, output the list of "every size that has different inodes" - for each output size, cksumed one file for each inode - if two different inodes (with the same file size) had their cksum match, then it replaced every file for the last inode, with a link to the first inode If you have to run it frequently, you may want to implement something similar. Although it ignores mtime info (and thus strips it when lning), it has the great benefit of finding every duplicate, be it renamed and move to another dir (as in ./her.2020-12-01/Library/Mail/?/Sent.mbox/?/Attachments/?/PhotoDeFamille.JPG versus ./his.2020-11-26/perso/photos/100_9999.JPG). (and by the way I reimplemented it in C, "just for fun" and for speed too: https://github.com/outtersg/dude/ . Hmm, in C but in French) -- Guillaume