I have a scenario where a "broken pipe" condition can happen on a
fairly regular basis and in all honesty it is to be expected.
We are working with satellite links to moving ship platforms (no it isn't
military based) and depending on location, sea conditions, and a few dozen other
factors, the data link can become somewhat unstable at times.
That and the high-latency for IP traffic itself (600ms to 1500ms) over such a
link has always presented some challenges for transfering/syncing data.
So we accept the fact that even with "acceleration" devices and other
methods to lower latency, the bottom line is you can still have some very small
to large interruptions in service (and again, depends on the environment at that
time).
To try and avoid some of the rsync broken pipe errors we were trying to use a
timeout setting of 600 (10 minutes) with 2.6.9. This does help but still
presents some failures.
I was starting to write a wrapper script that checked for these timeouts and
perform a wait+retry and noticed something odd.
The seconds being reported in the error message was signifigantly higher than
600.
3 Consecutive Examples:
$ rsync -av --progress --timeout=600 --partial tfile.tgz host1:/xfr
building file list ...
1 file to consider
tfile.tgz
io timeout after 1528 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(165) [sender=2.6.9]
$ rsync -av --progress --timeout=600 --partial tfile.tgz host1:/xfr
building file list ...
1 file to consider
tfile.tgz
io timeout after 9930 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(165) [sender=2.6.9]
$ rsync -av --progress --timeout=600 --partial tfile.tgz host1:/xfr
building file list ...
1 file to consider
tfile.tgz
io timeout after 4825 seconds -- exiting
rsync error: timeout in data send/receive (code 30) at io.c(165) [sender=2.6.9]
The 1528, 9930, and 4825 seconds is what caught my eye.
They could actually be correct as to when progress stopped vs. when it timed
out, but what happened to timeout=600?