On 8/1/2011 4:30 AM, Bjorn Madsen wrote:> I thought other might benefit from this lesson learned and thought it
> maybe should be added to the man-pages.
>
> I thought my network connection was glitchy and hence set rsync up for
> --timeout=120 but I found out that I was actually causing the glitch
> with this script:
> #! /bin/sh -
> while true; do rsync -avz --progress --timeout=120 --delete
> /media/rsync_gb01/movies/ myserver:movies; sleep 120; done
>
> The problem:
> When rsync is checking large files it takes time to verify the content
> at both ends, so when using the option --timeout=TIME both source and
> destination must be capable of performing the check within the time
> window provided. For a 2.3 Gb file this check might take 2:30 minutes,
> so if the option --timeout=120 then rsync will exit before the
> destination has been able to complete its file check.
>
> For the future options: Could/can rsync incorporate the adjustment of
> timeout to include the time used for the check? Fictive example:
> --timeout=xfer#1-check-time + 120 seconds?
>
> detailed example:
> me at source:~$ rsync -avz --progress
> --timeout=xfer#1-check-time+120 --delete /media/rsync_gb01/movies/
> myserver:movies
> sending incremental file list
> DVD1.mkv
> 2302868295 100% 14.01MB/s *0:02:36* (xfer#1, to-check=41/286)
> DVD2.mkv
> ...and so on...
>
> --
> Bjorn
I think the only way to really address that is to have the two rsync
processes heartbeat each other.
One side can know that itself is busy and so not decrement remaining
timeout, but one side cannot know if the other side is killed, hung,
network disconnected, or just busy, unless we add busy indication to the
protocol.
Such a heartbeat would need to be programmable. Any arbitrary schedule
you use will actually break some jobs that would have gone through, even
though it will keep some jobs alive that would have failed.
Consider an intermittent network. With no heartbeat, the pauses go
unnoticed, or at least they do not break the rsync session. data
transfer just runs, pauses, runs, pauses, etc until the job is done.
With heartbeat, the job is aborted whenever the heartbeat schedule is
broken. Say you define a rule, heartbeat every 20 seconds, abort after 3
consecutive failed beats. Some connections will be needlessly killed by
that. You'd need to be able to define the timing and the grace period
and maybe even fully arbitrary schedules of when to ping and under what
conditions to abort. It's going to be different for different people and
different connections and machines and file sets.
Also heartbeat will fix some cases where the ISP has implemented tcp
session timeouts, or whole net connection timeouts, that the customer
can not get around. In some cases if a tcp session sees no packets for 2
minutes, or even shorter!, the session is killed by the ISP's router or
other upstream hardware outside of the customers control and the next
packet the application tries to send is when the application discovers
that the session no longer exists. This is even while the overall net
connection remains up and busy with other traffic. You can't help this
by say, pinging the same remote host continuously in parallel while
rsync is running. The pings would not be part of rsync's tcp session,
would not make it busy, and would not keep it alive.
So sometimes you want it, sometimes you don't, and when you do, you want
to be able to specify the timing and rule for aborting.
I've gotten by ok with the existing timeout option and a knowledge of
how my filesets and net connections behave so it's not a major wish for
me personally.
--
bkw