bugzilla-daemon at mindrot.org
2023-Aug-03 12:51 UTC
[Bug 3598] New: Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Bug ID: 3598 Summary: Dead lock of sshd and Defunct of sshd Product: Portable OpenSSH Version: 9.1p1 Hardware: ix86 OS: Linux Status: NEW Severity: normal Priority: P5 Component: sshd Assignee: unassigned-bugs at mindrot.org Reporter: mark.zhang at nokia-sbell.com Hello expert, Recently, we encountered one scenario like: 1) login by root from remote. 2) with default loglevel(INFO) of sshd, it will print one log to auth.log by syslog (got one lock here). 3) But there was some problem that blocked the print action in step-2 a long time, more than 90 seconds. 4) The timer of LoginGraceTime(default 90 seconds) fired, then tiggered one another log. 5) Sshd try to get the lock, as the same lock that has been stuck in step 2. 6) Then the dead lock happened on sshd 7) Also caused the zombie process of the other sshd. Please help confirm. And any thing could be done from sshd to avoid dead lock. That may cause number of process/used memory kept increasing, when ssh login keep trying from outside. (gdb) bt #0 __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72 #1 0xf7956c82 in __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:65 #2 0xf7a2f394 in openlog (ident=0x5812c1a0 "sshd", logstat=1, logfac=32) at ../misc/syslog.c:384 #3 0x565e99ac in do_log () #4 0x565e9cf4 in sshlogv () #5 0x565e9b29 in sshsigdie () #6 0x56585084 in grace_alarm_handler () #7 <signal handler called> #8 __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72 #9 0xf7a336b1 in __libc_send (fd=5, buf=0x581644e0, len=153, flags=16384) at ../sysdeps/unix/sysv/linux/send.c:30 #10 0xf7a2f08c in _GI__vsyslog_chk (pri=<optimized out>, flag=<optimized out>, fmt=<optimized out>, ap=<optimized out>) at ../misc/syslog.c:155 #11 0xf7a2f2fb in __syslog (pri=6, fmt=0x56641418 "%.500s") at ../misc/syslog.c:117 #12 0x565e99cb in do_log () #13 0x565e9cf4 in sshlogv () #14 0x565e9a52 in sshlog () #15 0x5659a291 in auth_log () #16 0x565ad53b in monitor_child_preauth () #17 0x5658565b in privsep_preauth () #18 0x5658ae6d in main () Thanks, Mark -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-03 21:46 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Darren Tucker <dtucker at dtucker.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dtucker at dtucker.net --- Comment #1 from Darren Tucker <dtucker at dtucker.net> --- Created attachment 3710 --> https://bugzilla.mindrot.org/attachment.cgi?id=3710&action=edit Block signals while sysloggin You could try blocking signals while it's in syslog. That said, if the problem is that it's blocking in syslog indefinitely in the first call (and if it's timing out after 90s, that seems likely) you'll still have sshds blocked in syslog. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-03 23:36 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #2 from mzhan017 <mark.zhang at nokia-sbell.com> --- Darren, Yes, you're correct. We could be blocked in the first syslog call, even without the dead lock. But still could face the issue of the number of process/memory usage kept increasing. Is it possible to defense such situation of blocked syslog? That may could make sure that we could still login the system stable by ssh. Thanks, Mark -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:03 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |djm at mindrot.org --- Comment #3 from Damien Miller <djm at mindrot.org> --- IMO the problem is fundamentally that we're doing operations in a signal handler that are unsafe on some platforms. We should probably make sigdie() a noop anywhere snprintf()+syslog() are not guaranteed to be safe, which AFAIK is everything other than OpenBSD. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:17 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #4 from Darren Tucker <dtucker at dtucker.net> --- (In reply to Damien Miller from comment #3)> IMO the problem is fundamentally that we're doing operations in a > signal handler that are unsafe on some platforms.That's true but I don't think it's the problem here, unless they just happen to be hitting a signal race at exactly 90s.> We should probably > make sigdie() a noop anywhere snprintf()+syslog() are not guaranteed > to be safe, which AFAIK is everything other than OpenBSD.Now that privsep is mandatory we could move the LoginGraceTime signal handler into the privsep child and just have it _exit(somenumber), then have the monitor read that exit code in its normal event loop and log from there. I have most of the code for the monitor side written as part of another thing. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:47 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #5 from Damien Miller <djm at mindrot.org> --- nerfing sigdie would mean that we lose the following log messages: auth-pam.c: sigdie("PAM: authentication thread exited unexpectedly"); auth-pam.c: sigdie("PAM: authentication thread exited uncleanly"); sshd.c: sigdie("Timeout before authentication for %s port %d", I was about to suggest what Darren said re arranging for the process to exit with a magic value and moving the logging to the parent, but I see that he beat me to it :) OTOH I don't love the idea of moving the grace alarm to the privsep child, since it's intended not to be trustworthy. Other options include implementing LoginGraceTime in the monitor mainloop or having the listener do the logging (AFAIK it's still around at this point for MaxStartups tracking) -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 01:53 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #6 from Darren Tucker <dtucker at dtucker.net> --- (In reply to Damien Miller from comment #5)> Other options > include implementing LoginGraceTime in the monitor mainloopThat's non-trivial since some of the potential timeouts are prior to the monitor mainloop eg kex_exchange_identification().> or having the listener do the logging (AFAIK it's still around at this > point for MaxStartups tracking)That should be doable with a bit of plumbing, the only caveat I can think of is that the timeout log messages will come from a pid not directly associated with the connection. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 06:58 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3711| |ok?(dtucker at dtucker.net) Flags| | --- Comment #7 from Damien Miller <djm at mindrot.org> --- Created attachment 3711 --> https://bugzilla.mindrot.org/attachment.cgi?id=3711&action=edit implement LoginGraceTime logging in listener This removes the sigdie() in sshd.c and implements the LoginGraceTime logging in the listener process. E.g. Timeout before authentication for connection from [10.40.0.253]:2222 to [172.30.30.4]:51846 (pid = 28473) It also implements infrastructure for logging abnormal terminations (e.g. the monitor exiting with SIGSEGV) and adds a framework for custom handling of arbitrary monitor exit statuses that we can reuse to remove the auth-pam.c sigdie() calls. Oh, and it adds a SIGINFO handler to the listener for debugging :) Also at https://github.com/djmdjm/openssh-wip/pull/22 -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-07 05:16 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3711|ok?(dtucker at dtucker.net) | Flags| | Attachment #3711|0 |1 is obsolete| | Attachment #3714| |ok?(dtucker at dtucker.net) Flags| | --- Comment #8 from Damien Miller <djm at mindrot.org> --- Created attachment 3714 --> https://bugzilla.mindrot.org/attachment.cgi?id=3714&action=edit Fixed diff Revised diff that fixes a couple of logic errors and simplifies some code. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-07 06:47 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3714|ok?(dtucker at dtucker.net) | Flags| | --- Comment #9 from Damien Miller <djm at mindrot.org> --- Comment on attachment 3714 --> https://bugzilla.mindrot.org/attachment.cgi?id=3714 Fixed diff actually, this diff has a big problem too: because it tracks all child processes in the same structure, and because the tracking logic is incorrect, it limits the _total_ number of concurrent sessions to MaxSessions and not just the number of _authenticating_ sessions :( -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 03:22 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3714|0 |1 is obsolete| | Attachment #3715| |ok?(dtucker at dtucker.net) Flags| | --- Comment #10 from Damien Miller <djm at mindrot.org> --- Created attachment 3715 --> https://bugzilla.mindrot.org/attachment.cgi?id=3715&action=edit Really fixed diff This should fix the problems in the previous diff and simplifies things a little more. Child processes now signal that authentication was successful back to the listener, so it can stop tracking them. They do this by sending another char over the startup_pipe, in addition to the first one they send to signal they have received their rexec state. When the listener is so notified, it stops caring about the subprocess and frees up its slot so it doesn't count against MaxStartups. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 04:50 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #11 from Darren Tucker <dtucker at dtucker.net> --- Comment on attachment 3715 --> https://bugzilla.mindrot.org/attachment.cgi?id=3715 Really fixed diff>From 9f895491cc6a671fc49b9cda78edfe3801b0af74 Mon Sep 17 00:00:00 2001 >From: Damien Miller <djm at mindrot.org> >Date: Fri, 4 Aug 2023 14:51:03 +1000 >Subject: [PATCH] logging of monitor process exits in listenerThis seems like a bit too large of a change to go in so close to a release? -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 05:07 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #12 from Damien Miller <djm at mindrot.org> ---> This seems like a bit too large of a change to go in so close to a release?oh sure, not proposing this for 9.4 but afterwards -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-13 11:01 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |546554688 at qq.com --- Comment #13 from Damien Miller <djm at mindrot.org> --- *** Bug 3690 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 10:50 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 linker <546554688 at qq.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|9.1p1 |8.5p1 -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 12:09 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #14 from linker <546554688 at qq.com> --- "When can it be merged into the master repository?" -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 20:18 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|8.5p1 |-current --- Comment #15 from Damien Miller <djm at mindrot.org> --- Soon hopefully, I'd like it in before the next release. People testing it would help -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Jul-01 12:05 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 github at kalvdans.no-ip.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |github at kalvdans.no-ip.org -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Jul-01 19:52 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Alan D. Salewski <salewski at att.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |salewski at att.net -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2024-Dec-06 16:41 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #16 from Damien Miller <djm at mindrot.org> --- This was committed in openssh-9.8 -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Dec-07 19:23 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #17 from github at kalvdans.no-ip.org --- For reference, I think it was https://github.com/openssh/openssh-portable/commit/81c1099d22b81ebfd20a334ce986c4f753b0db29 -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
Apparently Analagous Threads
- [Bug 3690] New: sshd: root [priv] process sleeping leads to unprivileged child proc zombie
- [Bug 1363] New: sshd gets stuck: select() in packet_read_seqnr waits indefinitely
- [Bug 2143] New: X11 forwarding for ipv4 is broken when ipv6 is disabled on the loopback interface
- [Bug 1180] Add finer-grained controls to sshd
- Feature request: FAIL_DELAY-support for sshd