thr3ads.net - rsync - hang in select() on unix domain sockets, 60s timeout loop [May 2011]

If this information is useful, please help other people find it:
Share via:

Scott Mcdermott

2011-May-20 03:25 UTC

hang in select() on unix domain sockets, 60s timeout loop

I have rsync 3.0.8 on both ends, over ssh, which on
remote server appears to be hung in select():

Process has fd 0, 1, 2, and all are unix sockets
it's just hung, keeps timing out every 60 seconds then calls
select again
it's been hung for 15 hours

flags on the remote are:

     --server --sender -lHogDtpAXrRe.iLs --numeric-ids --inplace

it loops every 60 seconds with:

    select(1, [0], [], NULL, {60, 0}) = 0 (Timeout)

the listed readfd is a unix domain socket:

     $ sudo readlink /proc/`pgrep rsync`/fd/0
    socket:[62052357]

     $ sudo lsof | grep 62052357
    rsync 4532 root 0u unix 0xe68aa040 0t0 62052357 socket
    rsync 4532 root 1u unix 0xe68aa040 0t0 62052357 socket

     $ grep 62052357 /proc/net/unix
    e68aa040: 00000003 00000000 00000000 0001 03 62052357

So it's the same process.  Is it hung on itself? Howcome it
doesn't respond to timeout and just goes over again? Is it
waiting for a signal? Can I send it one and unstick it?

There don't appear to be any other fds of interest in the
select loop so I'm not sure what other event it could be
waiting on besides a signal.  It has been hung over 15
hours in same loop.

I did some searching and found some references to a Cygwin
issue, and also an old issue with non-blocking file
descriptors of ssh that appears to be fixed.  However I
don't see how ssh could be part of the picture here since
rsync is waiting on itself and nothing else seems to be
involved.  Unless we are waiting for SIGCHLD? But rsync has
no children in this case and only one other open fd (another
unix domain socket on fd2, this time with nobody else on the
other end looks like)

This has happened a few times now (for our backups) but does
not happen every time.  A little confused... I can add
'--timeout' but I'd really prefer to know why it's doing
this and be able to distinguish a real timeout error from an
rsync (or libc?) bug...

-- 
Scott

Seemingly Similar Threads

Search for more apparently analagous threads

rsync - May 2011 - hang in select() on unix domain sockets, 60s timeout loop

hang in select() on unix domain sockets, 60s timeout loop

Seemingly Similar Threads

Wisdom of the Ancients