I am seeing a problem with the way the rsync of symbolic links is done. Here is a simple example to duplicate this issue: prompt> # Create a test case to show rsync of symbolic link problem prompt> mkdir foo bar prompt> touch foo/file1 bar/file2 prompt> ln -s bar/file2 file prompt> rsync -av . ../duplicate building file list ... done created directory ../duplicate ./ bar/ bar/file2 file -> bar/file2 foo/ foo/file1 wrote 236 bytes read 52 bytes 576.00 bytes/sec total size is 9 speedup is 0.30 prompt> # Point symbolically linked file to a new file prompt> rm file prompt> touch bar/file3 prompt> ln -s bar/file3 file prompt> rsync -av . ../duplicate building file list ... done ./ bar/ bar/file3 file -> bar/file3 wrote 220 bytes read 36 bytes 512.00 bytes/sec total size is 9 speedup is 0.40 prompt> # Point symbolically linked file to another new file prompt> rm file prompt> touch foo/file4 prompt> ln -s foo/file4 file prompt> rsync -av . ../duplicate building file list ... done ./ file -> foo/file4 foo/ foo/file4 wrote 236 bytes read 36 bytes 544.00 bytes/sec total size is 9 speedup is 0.30 ACK! The symbolically linked file pointing to foo/file4 got updated first, causing anyone using it to see it pointing at a non-existant file! This bug is essentially a race condition that can cause the mirrored system to see the symbolic link get updated to point to a non-existant file since the file it is pointing at has not been mirrored yet. We see this problem quite often when we are mirroring our /usr/local filesystem. For example, if we have /usr/local/lib/libz.so symbolically linked to /usr/local/zlib/1.1.3/libz.so, but then update it to point to /usr/local/zlib/1.1.4/libz.so, then when the filesystem is mirrored, the /usr/local/lib/libz.so link gets updated long before the /usr/local/zlib/1.1.4 directory tree gets pushed. This causes any application that is started that using libz.so to fail while the rsync is running! This downtime can be quite long when pushing many updates across our WAN to servers on the other side of the world! It appears that the rsync pushes the updates out alphabetically, which is why the symbolic link to bar/file3 above worked OK, but the symbolic link to foo/file4 fails with the race condition. The fix for this would be to have rsync internally sort its filelist to push the updated files and directories first, then the updated symbolic links last. How can this be done? Or am I missing some rsync commandline issue to fix this problem? My only workaround at this point is to rsync twice, the first time without pushing any symbolic links, but this workaround can still burn us if an update is made to the master copy between the two rsync runs. Thanks...Tom -- Tom L. Schmidt, Manager/SysAdmin Characterization Equipment Micron Technology, Inc. 8000 S. Federal Way P.O. Box 6 Mail Stop 376 Boise, Idaho USA 83707-0006 mailto:tschmidt@micron.com http://www.4schmidts.com/
On Thu, May 08, 2003 at 09:44:10AM -0600, Tom Schmidt wrote:> I am seeing a problem with the way the rsync of symbolic links > is done. Here is a simple example to duplicate this issue: >[snip]> > ACK! The symbolically linked file pointing to foo/file4 got > updated first, causing anyone using it to see it pointing at a > non-existant file! > > This bug is essentially a race condition that can cause > the mirrored system to see the symbolic link get updated to > point to a non-existant file since the file it is pointing > at has not been mirrored yet.Not a bug, it is a race condition. rsync is not atomic.> We see this problem quite often when we are mirroring our > /usr/local filesystem. For example, if we have /usr/local/lib/libz.so > symbolically linked to /usr/local/zlib/1.1.3/libz.so, but then > update it to point to /usr/local/zlib/1.1.4/libz.so, then > when the filesystem is mirrored, the /usr/local/lib/libz.so > link gets updated long before the /usr/local/zlib/1.1.4 directory > tree gets pushed. This causes any application that is started > that using libz.so to fail while the rsync is running! This > downtime can be quite long when pushing many updates across our > WAN to servers on the other side of the world! > > It appears that the rsync pushes the updates out alphabetically, > which is why the symbolic link to bar/file3 above worked OK, but > the symbolic link to foo/file4 fails with the race condition. > > The fix for this would be to have rsync internally sort its filelist > to push the updated files and directories first, then the updated > symbolic links last. How can this be done? Or am I missing > some rsync commandline issue to fix this problem? > > My only workaround at this point is to rsync twice, the first time > without pushing any symbolic links, but this workaround can > still burn us if an update is made to the master copy between > the two rsync runs.If you need atomic updates use a staging area. Have two directories. rsync to the unused one and then swap them. --copy-links might also be of use. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
On Thu, May 08, 2003 at 09:44:10AM -0600, Tom Schmidt wrote:> The fix for this would be to have rsync internally sort its filelist > to push the updated files and directories first, then the updated > symbolic links last.Yes, that's one possible solution. However, changing the sort order would add an incompatibility with older versions, so I'm wondering if a different change wouldn't be better. I think it would be possible to change the loop in the generate_files() function to exclude symlinks, and then loop over the list again to process them all last. Since the sender doesn't care if the list is processed in order, I think that this would allow an updated receiving side to interoperate with older sending side and still benefit from the revised fetch order. The downside would be that if rsync is processing a really large file list, it may run noticably slower if it has to traverse the list twice instead of once. So, it would be good to measure this and see what kind of a performance hit we're talking about. ..wayne..
Wayne Davison wrote:> On Mon, May 12, 2003 at 09:46:36AM -0600, Tom Schmidt wrote: > >>The --delete-after option does not address this issue. > > > You missed the "if" part of his statement where he was referring to my > proposed change that would cause the symlinks to get transferred at the > end. In such a scenario the absence of the --delete-after option will > still allow symlinks to be broken for long periods of time during a big > transfer (since the file that is the target of a symlink can be removed > long before the symlink gets updated to a newly-transferred file). As > such, I am considering only performing a late-transfer of symlinks if > --delete-after has been specified -- without that option, the user > cannot be overly concerned about symlink validity. > > ..wayne..That makes sense. So you are proposing that the fix for this issue would combine "--symlink-after" functionality into the --delete-after option? That should work. I would be willing to beta-test any proposed patches to get this functionality. I am not familiar enough with the code to attempt patching this myself. Thanks in advance...Tom -- Tom L. Schmidt, Manager/SysAdmin Characterization Equipment Micron Technology, Inc. 8000 S. Federal Way P.O. Box 6 Mail Stop 376 Boise, Idaho USA 83707-0006 mailto:tschmidt@micron.com http://www.4schmidts.com/