Hello:
[I apologize if this is a repeat. I had to rebuild my posting profile,
and I think I didn't do it correctly before I sent a previous version
of this.]
We use rsync as a part of a home-grown backup solution. In the specific
case at hand, we're using rsync to copy volumes off-site. The
"sending"
server invokes rsync to transfer each volume to the off-site archive.
The call to rsync uses --rsync-path= to run, on the archive server, not
rsync directly but a wrapper script. The wrapper does a few things
initially including:
* Adding --link-dest options (to the result of previous backups)
* Changing the destination directory to an "in progress" name
It then runs the actual rsync in a child process. When the child
completes, if it exits with a zero exit code, the "in progress" name
is renamed to a permanent name by the wrapper.
This has been working for a while w/o difficulty, up to an including
the detection of "disk full" conditions. I'm no longer sure,
though, that it is proper to rely upon the exit code of the invoked rsync
on the remote server.
We've hit a case where the remote side volume becomes full and something
odd occurs. I'm not sure why this is different from other cases where
we've
seen it working. In this case, with --link-dest in use, the sending rsync
writes many errors such as:
rsync: recv_generator: mkdir
"/backup/host/vol/snapshot.2013.07.11.0.in_progress/live/lms/trylesson/toefl"
failed: No space left on device (28)
*** Skipping any contents from this failed directory ***
It then prints:
rsync error: some files/attrs were not transferred (see previous errors) (code
23) at main.c(1042) [sender=3.0.7]
and exits with an exit code of 23.
However, the wrapper is reporting that the remote rsync is exiting with an
exit code of zero. Because of this, the snapshot.2013.07.11.0.in_progress
directory is - improperly! - renamed to snapshot.2013.07.11.0 by the
wrapper script.
If the --link-dest is avoided, the situation is very different. The
sending rsync writes:
rsync: write failed on
"/backup/host/vol/snapshot.2013.07.11.0.in_progress/live/lms/courses/files/1/feedback/119/15.html":
No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(327) [receiver=3.0.9]
rsync: connection unexpectedly closed (33550 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(601)
[sender=3.0.7]
and then exits with an exit code of 12. Most significantly, the remote
rsync is also exiting with an exit code of 12 (thereby telling the wrapper
of the failure).
My main question is: should it be safe to rely upon the exit code of the remote
rsync? I imagine that I could push the directory-renaming logic to the sending
server, but this is going to introduce a new set of failure cases (eg. the
renaming must be SSHed; the SSH can fail) so I'm hoping to avoid that.
Thanks...Andrew