Hello, We've found some undesirable behavior with respect to LoginGraceTime. A minor code change in session.c seems to clear it up, but now I'm asking for help in better understanding the problem and determining if there any unexpected side effects of the change. First, the code change: $ diff orig_session.c session.c 216c216,218 < alarm(0); ---> verbose("Clearing alarm in do_authenticated"); > /*alarm(0);*/ > signal(SIGALRM, SIG_IGN);So, I replaced "alarm(0);" in do_authenticated with a call to verbose and a "signal(SIGALRM, SIG_IGN);" Now, the problem description. We are running OpenSSH 4.2p1 on Solaris 8 (Sparc) that has a recent recommended patch cluster installed. When we connect to this server using a variety of ssh clients (including 4.2p1), we noticed that sessions were dropping after about 10 minutes. We changed the LoginGraceTime to 30 seconds and sure enough, sessions were dropping in 30 seconds. We were also seeing messages like: " Timeout before authentication" in /var/adm/messages when the sessions were dropping. Setting LoginGraceTime to 0 (or something like 12 hours) was the leading candidate for a work around. We noticed that if we removed/renamed the ~/.ssh/id_rsa and ~/.ssh/id_rsa files on the client side, the connections would stay up. Similarly, an ssh -i /dev/null allowed the connections to stay up, but that was an ugly solution at best. It didn't matter if the id_[rd]sa keys matched an entry in the authorized_keys2 file on the server. The connections dropped after the GraceLoginTime either way. We'd been working on the problem for 'long enough' (> 60 man hours) and couldn't find anything interesting on google or the archives, so I dug into the code and came up with the possible solution above. After digging into the code a while, I had tried putting "UsePrivilegeSeparation no" in the sshd_conf file and the problem persisted, so I don't think it has anything to do with the privsep code. I can't imagine this effects every Solaris 8 user that has id_[rd]sa files or we would have seen something in our archive/google searches. Perhaps a recent Solaris patch introduced a change in libc? ... but then why does it only break when id_[rs]da files are present on the client side. Questions for you: A) Do you think the code change is a viable solution? There may be well founded reasons to use alarm(0) instead and/or reasons to avoid using signal(SIGALRM, SIG_IGN). Early testing here shows it works (but we haven't tested extensively yet). ... if it appears to be viable, can we get it included in the code base at some point? B) Are any of you willing/able to help us pursue the root cause further? If so, I can provide more configuration in formation. -Bill.
Fischer, Bill wrote:> Hello, > > We've found some undesirable behavior with respect to LoginGraceTime. A > minor code change in session.c seems to clear it up, but now I'm asking > for help in better understanding the problem and determining if there > any unexpected side effects of the change.Hi Bill, There have been some changes in this area recently. Could you try one of the snapshot releases at: ftp://ftp3.usa.openbsd.org/pub/OpenBSD/OpenSSH/portable/snapshot an let us know if the problem is still evident? Thanks, Damien Miller
The code certainly looks different (the line I changed last time is no longer there to change :), but the result seems to be the same. Here's the output from the server: $ /scratch2/openssh-exec/sbin/sshd -f /scratch2/openssh/etc/sshd_config -De Server listening on :: port 66. Server listening on 0.0.0.0 port 66. Generating 768 bit RSA key. RSA key generation complete. Connection from 10.1.1.1 port 33559 Failed none for root from 10.1.1.1 port 33559 ssh2 Found matching RSA key: a7:0d:cf:e0:8a:df:a6:7a:4d:2e:1b:5b:fa:34:b4:85 Postponed publickey for root from 10.1.1.1 port 33559 ssh2 Found matching RSA key: a7:0d:cf:e0:8a:df:a6:7a:4d:2e:1b:5b:fa:34:b4:85 Accepted publickey for root from 10.1.1.1 port 33559 ssh2 Accepted publickey for root from 10.1.1.1 port 33559 ssh2 Timeout before authentication for 10.1.1.1 That's VERBOSE level. The problem doesn't occur in any of the DEBUG levels since the login timer is disabled when in debug mode. If the debug output would be helpful, I could change the code to enable the timeout when in debug mode. The connection still works fine if I rename the id_[rd]sa* files in ~/.ssh on the client side. For what it's worth, adding a call to verbose in the authenticated: block of sshd.c shows that the authenticated: block is indeed getting executed. And adding a signal(SIGALRM, SIG_IGN) will cause the connection to remain up, but of course means all future SIGALRM's will be ignored, which may be less than desirable and if you ever need to set up another SIGALRM, that would almost certainly be bad news. Sure seems like the system is flat our ignoring the alarm(0) call. Not sure where else to look. -Bill. -----Original Message----- From: Damien Miller [mailto:djm at mindrot.org] Sent: Friday, January 13, 2006 3:36 PM To: Fischer, Bill Cc: openssh-unix-dev at mindrot.org Subject: Re: LoginGraceTime Fischer, Bill wrote:> Hello, > > We've found some undesirable behavior with respect to LoginGraceTime.> A minor code change in session.c seems to clear it up, but now I'm > asking for help in better understanding the problem and determining if> there any unexpected side effects of the change.Hi Bill, There have been some changes in this area recently. Could you try one of the snapshot releases at: ftp://ftp3.usa.openbsd.org/pub/OpenBSD/OpenSSH/portable/snapshot an let us know if the problem is still evident? Thanks, Damien Miller