Matt McCutchen
2006-Jan-19 15:21 UTC
Backing up two trees with corresponding files hard linked
Today I went to back up my computer to an external disk with rsnapshot. I have two copies of the Linux kernel; a few files differ, but matching files are hard-linked between the two trees to save space. Since I didn't give -H to rsync, it is making a separate file on the backup disk from each hard link on the source. I had the understanding that -H used an O(n^2) algorithm to match up hard links and it would be prohibitively expensive to use this option on a filesystem with about 100,000 files. Is this true? If so, is there or will there be a way to get rsync to take advantage of special situations like my two hard-linked trees? -- Matt McCutchen, ``hashproduct'' hashproduct@verizon.net -- http://mysite.verizon.net/hashproduct/
Wayne Davison
2006-Feb-24 18:30 UTC
Backing up two trees with corresponding files hard linked
On Thu, Jan 19, 2006 at 10:20:48AM -0500, Matt McCutchen wrote:> I had the understanding that -H used an O(n^2) algorithm to match up > hard links and it would be prohibitively expensive to use this option on > a filesystem with about 100,000 files. Is this true?It used to be true, but is not anymore. The code used to keep an entire extra file-list array sorted by inode, and do binary searches into this list for every hard-linked file. The current code has been optimized in several ways: - We only save inode information for files that have more than one file-system link. - After doing a qsort() by inode on the potentially linked files, rsync replaces the inode data in the file-list with linked-list data (without using any extra memory) that allows rsync to know all the files that are linked together and which file is the "master" (the one that got updated or is up-to-date). This allows us to handle all the hard-linking without doing any binary searching. So, these days -H should be nice and fast. ..wayne..