bugzilla-daemon at mindrot.org
2003-May-27 15:17 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 Summary: SSH freezes on cluster machine Product: Portable OpenSSH Version: -current Platform: ix86 OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: ssh AssignedTo: openssh-bugs at mindrot.org ReportedBy: andrews at comp.nus.edu.sg I am user of a cluster where I test my distributed program which consists of several computing "worker" programs running on different nodes. The workers communicate with one another using tcp, port 9900. I execute the workers remotely using ssh OpenSSH_3.4p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f The administrator has recently downgraded ssh to 3.4p1 after 3.5p1 also exhibited the same problem. The cluster is running Rocks v2.2, Linux kernel 2.4.18-27.7.xsmp I noticed that some (not all) of the nodes start having problems after I ran my program on them. When I try to ssh to those nodes, ssh freezes. But after a while (usually 1/2-1 day), ssh to those affected nodes return the message ssh_exchange_identification: Connection closed by remote host Thank you in advance for any help. Andrew ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-May-27 15:25 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 ------- Additional Comments From andrews at comp.nus.edu.sg 2003-05-28 01:25 ------- Created an attachment (id=313) --> (http://bugzilla.mindrot.org/attachment.cgi?id=313&action=view) ssh -vvv compute-0-14 This is what is being printed out on the screen before ssh hangs when I ran ssh -vvv compute-0-14 where compute-0-14 is the name of an affected node in the cluster. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-Jun-04 09:26 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 ------- Additional Comments From djm at mindrot.org 2003-06-04 19:26 ------- can you keep non-ssh tcp connections up for similar periods? i.e have you ruled out network-level issues? You might also want to set "ClientAliveInterval=120" in sshd_config to work around these. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-Jun-04 12:04 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 andrews at comp.nus.edu.sg changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From andrews at comp.nus.edu.sg 2003-06-04 22:04 ------- Thank you for your help. The problem was actually caused by remote file access through NFS. Remote execution of my program on a compute node causes some file to be updated. I was not aware that the filesystem where these files resided is actually mounted through NFS. It seems that after a remote execution of my program finishes, and ssh returns, somehow sshd at the remote side still needs to deal with the remote file access, which is somehow stuck, thus making ssh stuck. I have not encountered similar problem after I removed the remote file access from my program. Andrew ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.