We use rsync to update an nfs server. After an update, we noticed that a large number of clients didn't see the updated data. It took me a while to be able to reliably reproduce this problem, but it happens on old and new versions of rysnc. It also happens across all the platforms we use here (sun/linux/netapp). This shows the problem: [Note my home directory is NFS mounted] *************************************************** rm -f /tmp/testpath/* $HOME/testpath/* #Prime it echo some data > /tmp/testpath/testfile sleep 1 rsync -a /tmp/testpath/ $HOME/testpath/ ssh $otherbox ls -li $HOME/testpath #Break it echo additional data >> /tmp/testpath/testfile sleep 1 rsync -a -b /tmp/testpath/ $HOME/testpath/ ssh $otherbox ls -li $HOME/testpath #Fix it touch $HOME/testpath ssh $otherbox ls -li $HOME/testpath *************************************************** Here's the output: total 0 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile total 0 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile total 0 483080 -rw-r--r-- 1 brian source 26 May 6 18:05 testfile 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile~ The output from the "break it" section matches "prime it", but it should match the "fix it" output. What I think is happening: -------------------------- * testfile is updated in place, so the mtime of testpath isn't udpated. * The nfs client is caching the mtime of "testpath" * The seconds rsync modifies the contents of testpath, but when finished, it resets the mtime on "testpath" * The client believes it's cache is valid, so I doesn't refresh, and therefore it misses the update NFS Clients tested: sun (solaris 8 & 9), linux (2.4.18 & 2.4.24) NFS Servers tested: sun, linux, and netapp. They all behaved the same. Proposed fix: ------------- Change rsync so it doesn't reset mtimes on directories if --backup is in effect. Thanks, Brian
Brian Childs
2004-May-07 15:19 UTC
rsync-2.6.2: NFS clients confused after an rsync [PATCH included]
As a response to my original post, here's a patch that implements my proposed solution. I've tested it, and it fixes the problem, but I'm afraid there may be some hidden consequences of doing this. If anyone can think of anything, please let me know. Thanks, Brian --- rsync-2.6.2/util.c.orig 2004-05-07 10:06:08.000000000 -0400 +++ rsync-2.6.2/util.c 2004-05-07 11:06:24.000000000 -0400 @@ -127,9 +127,17 @@ int set_modtime(char *fname, time_t modtime) { extern int dry_run; + extern int make_backups; + if (dry_run) return 0; + if(make_backups) { + struct stat sb; + if(!lstat(fname, &sb) && S_ISDIR(sb.st_mode)) + return 0; + } + if (verbose > 2) { rprintf(FINFO, "set modtime of %s to (%ld) %s", fname, (long) modtime, On Thu, May 06, 2004 at 06:21:55PM -0400, Brian Childs wrote:> We use rsync to update an nfs server. After an update, we noticed that > a large number of clients didn't see the updated data. > > It took me a while to be able to reliably reproduce this problem, but it > happens on old and new versions of rysnc. It also happens across all > the platforms we use here (sun/linux/netapp). > > This shows the problem: [Note my home directory is NFS mounted] > > *************************************************** > rm -f /tmp/testpath/* $HOME/testpath/* > > #Prime it > echo some data > /tmp/testpath/testfile > sleep 1 > rsync -a /tmp/testpath/ $HOME/testpath/ > ssh $otherbox ls -li $HOME/testpath > > #Break it > echo additional data >> /tmp/testpath/testfile > sleep 1 > rsync -a -b /tmp/testpath/ $HOME/testpath/ > ssh $otherbox ls -li $HOME/testpath > > #Fix it > touch $HOME/testpath > ssh $otherbox ls -li $HOME/testpath > *************************************************** > > Here's the output: > total 0 > 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile > total 0 > 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile > total 0 > 483080 -rw-r--r-- 1 brian source 26 May 6 18:05 testfile > 483079 -rw-r--r-- 1 brian source 10 May 6 18:05 testfile~ > > The output from the "break it" section matches "prime it", but it should > match the "fix it" output. > > What I think is happening: > -------------------------- > > * testfile is updated in place, so the mtime of testpath > isn't udpated. > * The nfs client is caching the mtime of "testpath" > * The seconds rsync modifies the contents of testpath, but > when finished, it resets the mtime on "testpath" > * The client believes it's cache is valid, so I doesn't refresh, and > therefore it misses the update > > NFS Clients tested: sun (solaris 8 & 9), linux (2.4.18 & 2.4.24) > NFS Servers tested: sun, linux, and netapp. > > They all behaved the same. > > Proposed fix: > ------------- > Change rsync so it doesn't reset mtimes on directories if --backup is > in effect. > > Thanks, > Brian > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
On Thu, May 06, 2004 at 06:21:55PM -0400, Brian Childs wrote:> rm -f /tmp/testpath/* $HOME/testpath/* > > #Prime it > echo some data > /tmp/testpath/testfile > sleep 1 > rsync -a /tmp/testpath/ $HOME/testpath/ > ssh $otherbox ls -li $HOME/testpath > > #Break it > echo additional data >> /tmp/testpath/testfile > sleep 1 > rsync -a -b /tmp/testpath/ $HOME/testpath/ > ssh $otherbox ls -li $HOME/testpathDo you actually need to use the -b option for the breakage to happen? In the output you show, not only is the new testfile~ not yet found, but the extra data in testfile is not seen (including showing the wrong inode). Since rsync isn't actually updating any files in-place, the file gets a new inode while keeping the directory time unchanged, so that may be enough to confuse NFS. I had been considering making the preservation of directory times an optional occurance with rsync. The appended patch implements this. Comments? ..wayne.. -------------- next part -------------- --- options.c 6 May 2004 21:08:01 -0000 1.148 +++ options.c 11 May 2004 16:57:18 -0000 @@ -46,6 +46,7 @@ int preserve_devices = 0; int preserve_uid = 0; int preserve_gid = 0; int preserve_times = 0; +int preserve_dir_times = 0; int update_only = 0; int cvs_exclude = 0; int dry_run = 0; @@ -240,7 +241,8 @@ void usage(enum logcode F) rprintf(F," -o, --owner preserve owner (root only)\n"); rprintf(F," -g, --group preserve group\n"); rprintf(F," -D, --devices preserve devices (root only)\n"); - rprintf(F," -t, --times preserve times\n"); + rprintf(F," -t, --times preserve times on non-directories\n"); + rprintf(F," -d, --dir-times preserve times on directories\n"); rprintf(F," -S, --sparse handle sparse files efficiently\n"); rprintf(F," -n, --dry-run show what would have been transferred\n"); rprintf(F," -W, --whole-file copy whole files, no incremental checks\n"); @@ -346,6 +348,7 @@ static struct poptOption long_options[] {"group", 'g', POPT_ARG_NONE, &preserve_gid, 0, 0, 0 }, {"devices", 'D', POPT_ARG_NONE, &preserve_devices, 0, 0, 0 }, {"times", 't', POPT_ARG_NONE, &preserve_times, 0, 0, 0 }, + {"dir-times", 'd', POPT_ARG_NONE, &preserve_dir_times, 0, 0, 0 }, {"checksum", 'c', POPT_ARG_NONE, &always_checksum, 0, 0, 0 }, {"verbose", 'v', POPT_ARG_NONE, 0, 'v', 0, 0 }, {"quiet", 'q', POPT_ARG_NONE, 0, 'q', 0, 0 }, @@ -823,6 +826,8 @@ void server_options(char **args,int *arg argstr[x++] = 'D'; if (preserve_times) argstr[x++] = 't'; + if (preserve_dir_times && am_sender) + argstr[x++] = 'd'; if (preserve_perms) argstr[x++] = 'p'; if (recurse) --- rsync.c 23 Mar 2004 16:16:15 -0000 1.135 +++ rsync.c 11 May 2004 16:57:19 -0000 @@ -25,6 +25,7 @@ extern int verbose; extern int dry_run; extern int preserve_times; +extern int preserve_dir_times; extern int am_root; extern int am_server; extern int am_sender; @@ -125,9 +126,9 @@ int delete_file(char *fname) int set_perms(char *fname,struct file_struct *file,STRUCT_STAT *st, int report) { - int updated = 0; STRUCT_STAT st2; int change_uid, change_gid; + int keep_time, updated = 0; if (dry_run) return 0; @@ -140,12 +141,10 @@ int set_perms(char *fname,struct file_st st = &st2; } - if (preserve_times && !S_ISLNK(st->st_mode) && - cmp_modtime(st->st_mtime, file->modtime) != 0) { - /* don't complain about not setting times on directories - * because some filesystems can't do it */ - if (set_modtime(fname,file->modtime) != 0 && - !S_ISDIR(st->st_mode)) { + keep_time = S_ISDIR(st->st_mode) ? preserve_dir_times + : preserve_times && !S_ISLNK(st->st_mode); + if (keep_time && cmp_modtime(st->st_mtime, file->modtime) != 0) { + if (set_modtime(fname,file->modtime) != 0) { rprintf(FERROR, "failed to set times on %s: %s\n", full_fname(fname), strerror(errno)); return 0; --- rsync.yo 7 May 2004 00:18:37 -0000 1.169 +++ rsync.yo 11 May 2004 16:57:19 -0000 @@ -298,7 +298,8 @@ verb( -o, --owner preserve owner (root only) -g, --group preserve group -D, --devices preserve devices (root only) - -t, --times preserve times + -t, --times preserve times on non-directories + -d, --dir-times preserve times on directories -S, --sparse handle sparse files efficiently -n, --dry-run show what would have been transferred -W, --whole-file copy whole files, no incremental checks @@ -538,13 +539,22 @@ dit(bf(-D, --devices)) This option cause block device information to the remote system to recreate these devices. This option is only available to the super-user. -dit(bf(-t, --times)) This tells rsync to transfer modification times along -with the files and update them on the remote system. Note that if this +dit(bf(-t, --times)) This tells rsync to preserve modification times of +non-directories transferred to the remote system. Note that if this option is not used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a will cause the next transfer to behave as if it used -I, and all files will have their checksums compared and show up in log messages even if they haven't changed. + +dit(bf(-d, --dir-times)) This tells rsync to preserve the modification +times of directories transferred to the remote system. On a modern +rsync, these are left unpreserved by default to avoid causing problems +for NFS. + +Note: when sending files to an older rsync, the --times option will +imply --dir-times (if the option causes an error on the receiving +system, omit it and use --times to preserve all file/directory times). dit(bf(-n, --dry-run)) This tells rsync to not do any file transfers, instead it will just report the actions it would have taken.