bugzilla-daemon at mindrot.org
2003-May-27 15:17 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578
Summary: SSH freezes on cluster machine
Product: Portable OpenSSH
Version: -current
Platform: ix86
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: ssh
AssignedTo: openssh-bugs at mindrot.org
ReportedBy: andrews at comp.nus.edu.sg
I am user of a cluster where I test my distributed program which consists
of several computing "worker" programs running on different nodes. The
workers communicate with one another using tcp, port 9900. I execute
the workers remotely using ssh
OpenSSH_3.4p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f
The administrator has recently downgraded ssh to 3.4p1 after 3.5p1
also exhibited the same problem.
The cluster is running Rocks v2.2, Linux kernel 2.4.18-27.7.xsmp
I noticed that some (not all) of the nodes start having problems after I ran
my program on them. When I try to ssh to those nodes, ssh freezes.
But after a while (usually 1/2-1 day), ssh to those affected nodes return the
message
ssh_exchange_identification: Connection closed by remote host
Thank you in advance for any help.
Andrew
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-May-27 15:25 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 ------- Additional Comments From andrews at comp.nus.edu.sg 2003-05-28 01:25 ------- Created an attachment (id=313) --> (http://bugzilla.mindrot.org/attachment.cgi?id=313&action=view) ssh -vvv compute-0-14 This is what is being printed out on the screen before ssh hangs when I ran ssh -vvv compute-0-14 where compute-0-14 is the name of an affected node in the cluster. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-Jun-04 09:26 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578 ------- Additional Comments From djm at mindrot.org 2003-06-04 19:26 ------- can you keep non-ssh tcp connections up for similar periods? i.e have you ruled out network-level issues? You might also want to set "ClientAliveInterval=120" in sshd_config to work around these. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
bugzilla-daemon at mindrot.org
2003-Jun-04 12:04 UTC
[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578
andrews at comp.nus.edu.sg changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From andrews at comp.nus.edu.sg 2003-06-04 22:04
-------
Thank you for your help.
The problem was actually caused by remote file access through NFS.
Remote execution of my program on a compute node causes some file
to be updated. I was not aware that the filesystem where these files
resided is actually mounted through NFS.
It seems that after a remote execution of my program finishes,
and ssh returns, somehow sshd at the remote side still needs to deal with
the remote file access, which is somehow stuck, thus making ssh stuck.
I have not encountered similar problem after I removed the remote file
access from my program.
Andrew
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.