On Mon, 15 Dec 2003, jw schultz <jw@pegasys.ws> wrote:> OK, first pass on TODO complete.....> PERFORMANCE ----------------------------------------------------------....> Traverse just one directory at a time > > Traverse just one directory at a time. Tridge says it's possible. > > At the moment rsync reads the whole file list into memory at the > start, which makes us use a lot of memory and also not pipeline > network access as much as we could.An additional comment should be added observing that this will affect hardlink processing, since it relies on the entire flist array being present in order to match dev and inode numbers. But perhaps the required hlist array could be saved and built on the fly as items with node counts > 1 are encountered. ....> Hard-link handling > > At the moment hardlink handling is very expensive, so it's off by > default. It does not need to be so. > > Since most of the solutions are rather intertwined with the file > list it is probably better to fix that first, although fixing > hardlinks is possibly simpler. > > We can rule out hardlinked directories since they will probably > screw us up in all kinds of ways. They simply should not be used. > > At the moment rsync only cares about hardlinks to regular files. I > guess you could also use them for sockets, devices and other beasts, > but I have not seen them. > > When trying to reproduce hard links, we only need to worry about > files that have more than one name (nlinks>1 && !S_ISDIR).It would be very helpful if file_struct.flags could have a bit set to indicate that the node count was greater than 1. This info could be used later to optimize the hardlink search by only considering those flist entries with this flag bit set. It'd be nice to implement this bit setting in this protocol number so it can be widely distributed before 2.6.1 is released which could have the code to actually make use of it. I'd be interested in doing the later changes, but if Martin or jw could at least get the bit set... It doesn't even have to be --hwlink option dependent. Just examine the node count and set the bit.> The basic point of this is to discover alternate names that refer to > the same file. All operations, including creating the file and > writing modifications to it need only to be done for the first name. > For all later names, we just create the link and then leave it > alone.An earlier thread started 11/25/2003 points out that in certain cases, a hardlinked file is unnecessarily transferred in full. This is due to the algortihm described above. If the first file in the sorted list is missing, but a later one exists, then that file should be used as the master. I've been thinking of solutions to this as well. But not until after 2.6.0 is released.> If hard links are to be preserved: > > Before the generator/receiver fork, the list of files is received > from the sender (recv_file_list), and a table for detecting hard > links is built. > > The generator looks for hard links within the file list and does > not send checksums for them, though it does send other metadata. > > The sender sends the device number and inode with file entries, so > that files are uniquely identified. > > The receiver goes through and creates hard links (do_hard_links) > after all data has been written, but before directory permissions > are set. > > At the moment device and inum are sent as 4-byte integers, which > will probably cause problems on large filesystems. On Linux the > kernel uses 64-bit ino_t's internally, and people will soon have > filesystems big enough to use them. We ought to follow NFS4 in > using 64-bit device and inode identification, perhaps with a > protocol version bump. > > Once we've seen all the names for a particular file, we no longer > need to think about it and we can deallocate the memory. > > We can also have the case where there are links to a file that are > not in the tree being transferred. There's nothing we can do about > that. Because we rename the destination into place after writing, > any hardlinks to the old file are always going to be orphaned. In > fact that is almost necessary because otherwise we'd get really > confused if we were generating checksums for one name of a file and > modifying another. > > At the moment the code seems to make a whole second copy of the file > list, which seems unnecessary.Indeed! It does! Very wasteful. It should only need a list of pointers to the flist entries and sort that list. Furthermore, with the addition of the new multiple nodes flag bit requested above, the list of pointers would only contain pointers to flist entries with that bit set, resulting in a much smaller list. -- John Van Essen Univ of MN Alumnus <vanes002@umn.edu>
On Tue, Dec 16, 2003 at 03:18:15PM -0600, John Van Essen wrote:> On Mon, 15 Dec 2003, jw schultz <jw@pegasys.ws> wrote: > > > OK, first pass on TODO complete. > .... > > PERFORMANCE ---------------------------------------------------------- > .... > > Traverse just one directory at a time > > > > Traverse just one directory at a time. Tridge says it's possible. > > > > At the moment rsync reads the whole file list into memory at the > > start, which makes us use a lot of memory and also not pipeline > > network access as much as we could. > > An additional comment should be added observing that this will affect > hardlink processing, since it relies on the entire flist array being > present in order to match dev and inode numbers. But perhaps the > required hlist array could be saved and built on the fly as items > with node counts > 1 are encountered.Dynamic creation of the hlist set (hash perhaps) would deal with that, yes.> > .... > > Hard-link handling > > > > At the moment hardlink handling is very expensive, so it's off by > > default. It does not need to be so. > > > > Since most of the solutions are rather intertwined with the file > > list it is probably better to fix that first, although fixing > > hardlinks is possibly simpler. > > > > We can rule out hardlinked directories since they will probably > > screw us up in all kinds of ways. They simply should not be used. > > > > At the moment rsync only cares about hardlinks to regular files. I > > guess you could also use them for sockets, devices and other beasts, > > but I have not seen them. > > > > When trying to reproduce hard links, we only need to worry about > > files that have more than one name (nlinks>1 && !S_ISDIR). > > It would be very helpful if file_struct.flags could have a bit set to > indicate that the node count was greater than 1. This info could be > used later to optimize the hardlink search by only considering those > flist entries with this flag bit set. > > It'd be nice to implement this bit setting in this protocol number so > it can be widely distributed before 2.6.1 is released which could have > the code to actually make use of it. I'd be interested in doing the > later changes, but if Martin or jw could at least get the bit set... > It doesn't even have to be --hwlink option dependent. Just examine > the node count and set the bit.I'm not keen on squeezing that in at this time. Lets get it out the door, hardlink performance improvements can be made in a minor release. I'm also a bit more inclined to pass nlinks (IFF non-zero and ~IS_DIR).> > > The basic point of this is to discover alternate names that refer to > > the same file. All operations, including creating the file and > > writing modifications to it need only to be done for the first name. > > For all later names, we just create the link and then leave it > > alone. > > An earlier thread started 11/25/2003 points out that in certain cases, > a hardlinked file is unnecessarily transferred in full. This is due > to the algortihm described above. If the first file in the sorted list > is missing, but a later one exists, then that file should be used as > the master. I've been thinking of solutions to this as well. But > not until after 2.6.0 is released. > > > If hard links are to be preserved: > > > > Before the generator/receiver fork, the list of files is received > > from the sender (recv_file_list), and a table for detecting hard > > links is built. > > > > The generator looks for hard links within the file list and does > > not send checksums for them, though it does send other metadata. > > > > The sender sends the device number and inode with file entries, so > > that files are uniquely identified. > > > > The receiver goes through and creates hard links (do_hard_links) > > after all data has been written, but before directory permissions > > are set. > > > > At the moment device and inum are sent as 4-byte integers, which > > will probably cause problems on large filesystems. On Linux the > > kernel uses 64-bit ino_t's internally, and people will soon have > > filesystems big enough to use them. We ought to follow NFS4 in > > using 64-bit device and inode identification, perhaps with a > > protocol version bump. > > > > Once we've seen all the names for a particular file, we no longer > > need to think about it and we can deallocate the memory. > > > > We can also have the case where there are links to a file that are > > not in the tree being transferred. There's nothing we can do about > > that. Because we rename the destination into place after writing, > > any hardlinks to the old file are always going to be orphaned. In > > fact that is almost necessary because otherwise we'd get really > > confused if we were generating checksums for one name of a file and > > modifying another. > > > > At the moment the code seems to make a whole second copy of the file > > list, which seems unnecessary. > > Indeed! It does! Very wasteful. It should only need a list of pointers > to the flist entries and sort that list. Furthermore, with the addition > of the new multiple nodes flag bit requested above, the list of pointers > would only contain pointers to flist entries with that bit set, resulting > in a much smaller list.You do mean multiple paths, same node? :) Lets take this up after the release, shall we? -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
Hello, I read with interest the mailing list thread found here: http://marc.10east.com/?t=107160967400007&r=1&w=2 We have a "situation" with rsync and --hard-links that was the reason for my search in MARC's rsync list archive that turned up the thread shown above. After reading through that thread, and other information on this topic, I believe that sharing our situation with you will in itself prove to be a good contribution to rsync (which is an excellent tool, BTW). So, here goes: We have a process on a backup server (I called it "s" below), that each night rsyncs a full copy of /, /var, and /usr from a great number of systems. As a rule we put /, /var, and /usr on separate partitions, but that detail is not important. What is important is to understand exactly how we do these nightly, full system backups. First, let me start by showing you what a small set of the system_backups hierarchy looks like: root@s:/vol/6/system_backups# find . -type d -maxdepth 1 . ./client1 ./docs1.colo1 ./docs2.colo1 ./ipfw-internal.colo1 ./ipfw1 ./ipfw2 ./docsdev1 root@s:/vol/6/system_backups# find . -type d -maxdepth 2|head -25|egrep -v '^\./[^/]+$'|sort . ./client1/20031223 ./client1/20031224 ./client1/20031225 ./client1/20031226 ./client1/20031227 ./client1/20031229 ./client1/20040102 ./client1/current ./docs1.colo1/20031219 ./docs1.colo1/20031223 ./docs1.colo1/20031224 ./docs1.colo1/20031225 ./docs1.colo1/20031226 ./docs1.colo1/20031227 ./docs1.colo1/20031229 ./docs1.colo1/20040102 ./docs1.colo1/current ./docs1.colo1/image-20031218 ./docs2.colo1/20031218 ./docs2.colo1/20031219 ./docs2.colo1/current OK, that gives you an idea of how the hierarchy looks. Here is the critical part, though. The logic that creates these each night looks like this: TODAY=<YYYYMMDD for today> for HOST in (<hosts>); do cp -al $HOST/current $HOST/$TODAY ...now rsync remote $HOST into my local $HOST/current... done For those not familiar with the -l option to cp: root@s:/vol/6/system_backups# man cp|grep -B1 -A1 'hard links instead' -l, --link Make hard links instead of copies of non-directo- ries. What we end up with is a tree that is _very_ fast to rsync each night, with revision history going back indefinitely, at the disk usage cost of only files that change (rare) and the directories (about 8MB per machine). Note, however, that the _vast_ majority of file entries on these file systems (system_backups) are hard links. Many inodes will have 20, 30, or more filename entries pointing at them (depending strictly on how much history we choose to keep). Keeping all that in mind, now understand that server "s" has /vol/(0..14) installed in its disk subsystem, and (the important part) each of those volumes has a slow mirror -- one rsync per day. We do not keep those mirrors mounted, but you could think of /vol/0 having a /vol/0_mirror partner that is rsynced once every twenty-four hours. All of this works absolutely perfectly, with one exception, the daily rsync of /vol/N to /vol/N_mirror for volumes that hold system_backups, and the reason appears to be the --hard-links flag. Rsync, which is running completely locally for /vol/N to /vol/N_mirror work, exhausts all of the RAM and swap allocated to it in this machine (3GB), sends the machine into a maddening swap spiral, etc. The issue only exists for /vol/N vols where we have "system_backups" stored. I wanted to share this circumstance with you because my reading of the discussion on this topic, though encouraging, left me with the impression that some might not be thinking about situations like this one, where it is perfectly normal and desired to have many hard links to one inode, and hundreds of thousands of hard links in one file system. To give you an idea of the type of information one can glean from such a backup process, here are a couple of examples. Keep in mind that files with link-count of 1 changed on the date indicated by the directory: root@s:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|head -2 20040102/root/.bash_history 20040102/tmp/.803.e4a1 root@s:/vol/6/system_backups/client1# diff 20040102/root/.bash_history current/root/.bash_history 1d0 < lynx http://localhost:1081 --source | grep Rebuilding | head -1 | cut 10- 500a500> ssh ljacobs@supermagroot@s:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|cut -d/ -f1,2,3,4|sort |uniq -c 1 20040102/SYMLINKS 1 20040102/root/.bash_history 1 20040102/tmp/.803.e4a1 1 20040102/usr/local/BMS 54 20040102/usr/local/WWW 17 20040102/usr/local/etc 1 20040102/usr/sbin/symlinks 42 20040102/vol/1/bmshome 1 20040102/vol/2/webalizer_working 12 20040102/vol/3/home You'll notice that the hard link counts in this file system are not very high yet (only 8), yet it is _very_ intensive to have rsync try to sync /vol/6system_backups/client1 to /vol/6_mirror/system_backups/client1 with the --hard-links flag set: root@s:/vol/6/system_backups/client1# find 20040102 ! -links 1 -type f -printf '%n\t%i\t%s\t%d\t%h/%f\n'|head -50|tail -5 8 11323 10108 2 20040102/bin/mknod 8 11324 25108 2 20040102/bin/more 8 11325 60912 2 20040102/bin/mount 8 11326 10556 2 20040102/bin/mt-GNU 8 11327 33848 2 20040102/bin/mv If there is anything that I did not articulate clearly, if you have any followup questions, if you would like us to test some code for you guys, or if there is anything else that you feel that I can do to help, please do not hesitate to ask. Sincerely, -- Lester Hightower 10East Corp. p.s. 10East created and now supports the MARC system (marc.10east.com) in various ways, including hosting it, though it is primarily administered by Mr. Hank Leininger, a good friend and former employee. I didn't see any mention of MARC in the rsync web-site. Please feel free to use it.