Bob Kenney
2003-May-02 04:19 UTC
rsync+ssh2 from Tru64 unix to linux has intermittent hangs.
This is odd: We're using rsync to mirror multiple directories from a server to two clients. The server is running Tru64 v5.1a, client A is running Redhat Linux 8.0, and client B is running Tru64 v4.0g. The mirrors for both clients are running at the same time interval (10 minutes, offset by 5 minutes). All machines are running rsync v2.5.6, and using ssh2 v3.2.3 as the transport rather than rsh. The commands used to rsync to both clients are exactly the same: /usr/local/bin/rsync -rltvR -e "/usr/local/bin/ssh2 -x" \ --size-only --delete <source_dir> <destination_host>:<dest_dir> The destination directory on both clients for the files/directories being mirrored is a samba share, the active share being on client A, and the backup(inactive) share on client B. The files in the share, and the samba share itself are set to be read only by the PC's accessing it. The rsync mirror to client B(the Tru64 box) has no problems at all. The rsync mirror to client A(RH Linux 8.0) has this very odd intermittent hang. About once every 3-4 hours, it hangs. No correlation with a) which directory it's mirroring, or b) the time of day, or c) how many minutes into the hour it is. When I look at the process(es) on the server, the rsync and ssh2 processes are still there, but getting no run time. Looking at the client box, there is no rsync server, but the sshd2 process is still there. No corresponding errors in the error log on the server from either the local or remote rsync processes. If we just let it go without killing the hung rsync process, it times out anyways after almost exactly one hour. About the only thing I could think of was that we're having problems with file locking on the Tru64 server(applications have files open & possibly hard-locked in the areas rsync is mirroring), or file locking of some kind(?) by samba on client A when people are using it. I tried altering rsync on the server so that file reads were non-blocking, but that didn't help. Not sure about samba on the client, or how I would go about testing it. About the only other complication is that the linux box is running kernel 2.4.20, with the ACL/EA patch added. I've tried recompiling rsync there after adding the ACL/EA patch for rsync, but that didn't seem to help/hinder in any way. Not sure if this is an issue - the source files on the Tru64 server do not have ACL's on them, so it shouldn't be(?). Very frustrating. We've been banging our heads against this ever since we set up the linux box and made it the primary samba share. I don't recall us having any issues like this when client B was the primary(only) samba share. Thanks in advance for any help you can provide. -- Bob Kenney
jw schultz
2003-May-02 07:10 UTC
rsync+ssh2 from Tru64 unix to linux has intermittent hangs.
On Thu, May 01, 2003 at 02:19:22PM -0400, Bob Kenney wrote:> > > This is odd: > > We're using rsync to mirror multiple directories from a server to > two clients. The server is running Tru64 v5.1a, client A is running > Redhat Linux 8.0, and client B is running Tru64 v4.0g. > > The mirrors for both clients are running at the same time interval > (10 minutes, offset by 5 minutes). > > All machines are running rsync v2.5.6, and using ssh2 v3.2.3 as the > transport rather than rsh. The commands used to rsync to both clients > are exactly the same: > > /usr/local/bin/rsync -rltvR -e "/usr/local/bin/ssh2 -x" \ > --size-only --delete <source_dir> <destination_host>:<dest_dir> > > > The destination directory on both clients for the files/directories > being mirrored is a samba share, the active share being on client A, and > the backup(inactive) share on client B. The files in the share, and the > samba share itself are set to be read only by the PC's accessing it. > > > > > The rsync mirror to client B(the Tru64 box) has no problems at > all. > > > The rsync mirror to client A(RH Linux 8.0) has this very odd > intermittent hang. About once every 3-4 hours, it hangs. No > correlation with a) which directory it's mirroring, or b) the time > of day, or c) how many minutes into the hour it is. > > When I look at the process(es) on the server, the rsync and ssh2 > processes are still there, but getting no run time. Looking at the > client box, there is no rsync server, but the sshd2 process is still > there. No corresponding errors in the error log on the server from > either the local or remote rsync processes. If we just let it go without > killing the hung rsync process, it times out anyways after almost exactly > one hour. > > About the only thing I could think of was that we're having problems > with file locking on the Tru64 server(applications have files open & > possibly hard-locked in the areas rsync is mirroring), or file locking > of some kind(?) by samba on client A when people are using it. I tried > altering rsync on the server so that file reads were non-blocking, but > that didn't help. Not sure about samba on the client, or how I would go > about testing it. > > About the only other complication is that the linux box is running > kernel 2.4.20, with the ACL/EA patch added. I've tried recompiling > rsync there after adding the ACL/EA patch for rsync, but that didn't > seem to help/hinder in any way. Not sure if this is an issue - the > source files on the Tru64 server do not have ACL's on them, so it > shouldn't be(?). > > Very frustrating. We've been banging our heads against this ever > since we set up the linux box and made it the primary samba share. I > don't recall us having any issues like this when client B was the > primary(only) samba share.This sounds very odd. The sshd process should be exited when the rsync process is dead. I don't know what could cause this but i'll give you a small bit of educated speculation. ACL/EAs are, i am sure, not the problem. Any chance you are running out of memory? Rsync is a prime candidate for triggering and being targeted by the OOM killer. Once your system has been hit by the OOM killer it isn't trustworthy. It is much better to disable overcommit which i think 2.4.20 supports. An OOM event should show in syslog unless syslogd gets killed (happened to me once). Check to see what kind of locking you have enabled in smb.conf. I really doubt this is it but someone more up-to-date on samba might correct me. You don't mention what filesystem type is on the RH box. Rsync tends to thrash XFS. I mention this only as a warning. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt