douglas.manton at uk.ibm.com
2001-Jan-23  09:02 UTC
sshd hanging after multiple successive logons
Folks,
I use OpenSSH to poll a number of remote servers once every five minutes
and obtain a number of attributes.  This is done using ssh as "sexec":
        ssh stats at remotehost getstats
This returns the output of the getstats program which is parsed, etc...
The problem is that after so many connections, the parent sshd hangs and
does not accept any more connections.  I have reproduced the problem using
a simple shell script on my local machine:
        while sleep 1
        do
                ssh me at localhost whoami
        done
This iterates about 4000 times before sshd hangs.  I can see that sshd is
waiting in _pthread_waitlock when it has hung.  It actually decreases the
CPU time when in this state.  I can conenct to the daemon on port 22 but
it does not present the version string.
I realise that 4000 connections is not bad, but at five minute intervals
over 100 machines this is happening every couple of days.
I am running OpenSSH 2.3.0p1 under AIX 4.3.3.0-ML6, compiled with IBM VAC
v5.
Any ideas?  Has this been seen.  I can reproduce it every time.  When I
get a chance I will test the build on my Ultra 10.
Many thanks,
--------------------------------------------------------
 Doug Manton, AT&T EMEA Firewall and Security Solutions
                   demanton at att.com
--------------------------------------------------------
"If privacy is outlawed, only outlaws will have privacy"
What is value of MaxStartups in sshd_config ? douglas.manton at uk.ibm.com wrote:> > > Folks, > > I use OpenSSH to poll a number of remote servers once every five minutes > and obtain a number of attributes. This is done using ssh as "sexec": > > ssh stats at remotehost getstats > > This returns the output of the getstats program which is parsed, etc... > > The problem is that after so many connections, the parent sshd hangs and > does not accept any more connections. I have reproduced the problem using > a simple shell script on my local machine: > > while sleep 1 > do > ssh me at localhost whoami > done > > This iterates about 4000 times before sshd hangs. I can see that sshd is > waiting in _pthread_waitlock when it has hung. It actually decreases the > CPU time when in this state. I can conenct to the daemon on port 22 but > it does not present the version string. > > I realise that 4000 connections is not bad, but at five minute intervals > over 100 machines this is happening every couple of days. > > I am running OpenSSH 2.3.0p1 under AIX 4.3.3.0-ML6, compiled with IBM VAC > v5. > > Any ideas? Has this been seen. I can reproduce it every time. When I > get a chance I will test the build on my Ultra 10. > > Many thanks, > -------------------------------------------------------- > Doug Manton, AT&T EMEA Firewall and Security Solutions > > demanton at att.com > -------------------------------------------------------- > "If privacy is outlawed, only outlaws will have privacy"
On Tue, 23 Jan 2001 douglas.manton at uk.ibm.com wrote:> The problem is that after so many connections, the parent sshd hangs and > does not accept any more connections. I have reproduced the problem using > a simple shell script on my local machine:I am running something similar now (770 connections and counting). What version of OpenSSH are you running? If you can, please try the snapshot[1] and see if that resolves the problem. If you can run sshd under a debugger, try sending it an ABRT signal when it locks and see where it was stuck. Regards, Damien -- | ``We've all heard that a million monkeys banging on | Damien Miller - | a million typewriters will eventually reproduce the | <djm at mindrot.org> | works of Shakespeare. Now, thanks to the Internet, / | we know this is not true.'' - Robert Wilensky UCB / http://www.mindrot.org
douglas.manton at uk.ibm.com
2001-Jan-23  11:57 UTC
sshd hanging after multiple successive logons
> What is value of MaxStartups in sshd_config ?This is left at the default. The connections are being successfully authenticated and each one is terminated before the next one is made -- so this _should_ have no effect. The annoying part is that if sshd were to completely die, it would be automatically restarted by the AIX system resource controller daemon. But because it is still a valid process... Best wishes, -------------------------------------------------------- Doug Manton, AT&T EMEA Firewall and Security Solutions E: demanton at att.com -------------------------------------------------------- "If privacy is outlawed, only outlaws will have privacy"
douglas.manton at uk.ibm.com
2001-Jan-23  17:51 UTC
sshd hanging after multiple successive logons
Damien,
I am still running the test, so haven't had a chance to test the snapshot
yet.  Will do after I have investigated the following:
I have found a clue in my syslog:
Jan 22 18:33:00 myserver sshd[30586]: Accepted rsa for myuser from
10.0.0.1 port 57453
Jan 22 18:33:03 myserver sshd[30632]: Accepted rsa for myuser from
10.0.0.1 port 57454
Jan 22 18:33:05 myserver sshd[30678]: Accepted rsa for myuser from
10.0.0.1 port 57455
Jan 22 18:33:07 myserver sshd[30468]: Accepted rsa for myuser from
10.0.0.1 port 57456
Jan 22 18:33:09 myserver sshd[30514]: Accepted rsa for myuser from
10.0.0.1 port 57457
Jan 22 18:33:11 myserver sshd[20990]: Generating new 768 bit RSA key.
Jan 22 18:33:11 myserver sshd[30560]: Accepted rsa for myuser from
10.0.0.1 port 57458
Jan 23 08:56:16 myserver sshd[25084]: Server listening on 0.0.0.0 port 22.
Jan 23 08:56:16 myserver sshd[25084]: Generating 768 bit RSA key.
Jan 23 08:56:18 myserver sshd[25084]: RSA key generation complete.
Note how the daemon hangs when the user connects to the second a new RSA
key is generated.  The key generation never completes.  The morning entry
was my killing and restarting of the daemon.
Coincidence?
Many thanks,
--------------------------------------------------------
 Doug Manton, AT&T EMEA Firewall and Security Solutions
                   E:  demanton at att.com
--------------------------------------------------------
"If privacy is outlawed, only outlaws will have privacy"
On Tue, 23 Jan 2001, Damien Miller wrote:
> On Tue, 23 Jan 2001 douglas.manton at uk.ibm.com wrote:
>
> > The problem is that after so many connections, the parent sshd hangs
and> > does not accept any more connections.  I have reproduced the problem
using> > a simple shell script on my local machine:
>
> I am running something similar now (770 connections and counting).
>
> What version of OpenSSH are you running? If you can, please try the
> snapshot[1] and see if that resolves the problem.
douglas.manton at uk.ibm.com
2001-Feb-08  13:36 UTC
sshd hanging after multiple successive logons
Damien,
I was able to confirm this problem was caused by a deadlock when key
regeneration began during a client logon.  I set the key regeneration
interval to 15 seconds and kicked-off the logon script.  It took between 2
and 4 minutes for sshd to hang!
The good news is that I ran the latest snapshot under the same conditions.
 It has been working for two hours and still going strong.  Prognosis:
cured.
I look forward to the next release :-)
Thanks for your help,
--------------------------------------------------------
 Doug Manton, AT&T EMEA Firewall and Security Solutions
                 E:  demanton at att.com
--------------------------------------------------------
"If privacy is outlawed, only outlaws will have privacy"