bugzilla-daemon at mindrot.org
2023-Aug-03 12:51 UTC
[Bug 3598] New: Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Bug ID: 3598
Summary: Dead lock of sshd and Defunct of sshd
Product: Portable OpenSSH
Version: 9.1p1
Hardware: ix86
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: sshd
Assignee: unassigned-bugs at mindrot.org
Reporter: mark.zhang at nokia-sbell.com
Hello expert,
Recently, we encountered one scenario like:
1) login by root from remote.
2) with default loglevel(INFO) of sshd, it will print one log to
auth.log by syslog (got one lock here).
3) But there was some problem that blocked the print action in step-2 a
long time, more than 90 seconds.
4) The timer of LoginGraceTime(default 90 seconds) fired, then tiggered
one another log.
5) Sshd try to get the lock, as the same lock that has been stuck in
step 2.
6) Then the dead lock happened on sshd
7) Also caused the zombie process of the other sshd.
Please help confirm.
And any thing could be done from sshd to avoid dead lock. That may
cause number of process/used memory kept increasing, when ssh login
keep trying from outside.
(gdb) bt
#0 __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72
#1 0xf7956c82 in __lll_lock_wait_private () at
../sysdeps/unix/sysv/linux/i386/lowlevellock.S:65
#2 0xf7a2f394 in openlog (ident=0x5812c1a0 "sshd", logstat=1,
logfac=32) at ../misc/syslog.c:384
#3 0x565e99ac in do_log ()
#4 0x565e9cf4 in sshlogv ()
#5 0x565e9b29 in sshsigdie ()
#6 0x56585084 in grace_alarm_handler ()
#7 <signal handler called>
#8 __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72
#9 0xf7a336b1 in __libc_send (fd=5, buf=0x581644e0, len=153,
flags=16384) at ../sysdeps/unix/sysv/linux/send.c:30
#10 0xf7a2f08c in _GI__vsyslog_chk (pri=<optimized out>,
flag=<optimized out>, fmt=<optimized out>, ap=<optimized out>)
at
../misc/syslog.c:155
#11 0xf7a2f2fb in __syslog (pri=6, fmt=0x56641418 "%.500s") at
../misc/syslog.c:117
#12 0x565e99cb in do_log ()
#13 0x565e9cf4 in sshlogv ()
#14 0x565e9a52 in sshlog ()
#15 0x5659a291 in auth_log ()
#16 0x565ad53b in monitor_child_preauth ()
#17 0x5658565b in privsep_preauth ()
#18 0x5658ae6d in main ()
Thanks,
Mark
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-03 21:46 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Darren Tucker <dtucker at dtucker.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dtucker at dtucker.net
--- Comment #1 from Darren Tucker <dtucker at dtucker.net> ---
Created attachment 3710
--> https://bugzilla.mindrot.org/attachment.cgi?id=3710&action=edit
Block signals while sysloggin
You could try blocking signals while it's in syslog.
That said, if the problem is that it's blocking in syslog indefinitely
in the first call (and if it's timing out after 90s, that seems likely)
you'll still have sshds blocked in syslog.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-03 23:36 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #2 from mzhan017 <mark.zhang at nokia-sbell.com> --- Darren, Yes, you're correct. We could be blocked in the first syslog call, even without the dead lock. But still could face the issue of the number of process/memory usage kept increasing. Is it possible to defense such situation of blocked syslog? That may could make sure that we could still login the system stable by ssh. Thanks, Mark -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:03 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |djm at mindrot.org
--- Comment #3 from Damien Miller <djm at mindrot.org> ---
IMO the problem is fundamentally that we're doing operations in a
signal handler that are unsafe on some platforms. We should probably
make sigdie() a noop anywhere snprintf()+syslog() are not guaranteed to
be safe, which AFAIK is everything other than OpenBSD.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:17 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #4 from Darren Tucker <dtucker at dtucker.net> --- (In reply to Damien Miller from comment #3)> IMO the problem is fundamentally that we're doing operations in a > signal handler that are unsafe on some platforms.That's true but I don't think it's the problem here, unless they just happen to be hitting a signal race at exactly 90s.> We should probably > make sigdie() a noop anywhere snprintf()+syslog() are not guaranteed > to be safe, which AFAIK is everything other than OpenBSD.Now that privsep is mandatory we could move the LoginGraceTime signal handler into the privsep child and just have it _exit(somenumber), then have the monitor read that exit code in its normal event loop and log from there. I have most of the code for the monitor side written as part of another thing. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 00:47 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
--- Comment #5 from Damien Miller <djm at mindrot.org> ---
nerfing sigdie would mean that we lose the following log messages:
auth-pam.c: sigdie("PAM: authentication thread exited
unexpectedly");
auth-pam.c: sigdie("PAM: authentication thread exited
uncleanly");
sshd.c: sigdie("Timeout before authentication for %s port %d",
I was about to suggest what Darren said re arranging for the process to
exit with a magic value and moving the logging to the parent, but I see
that he beat me to it :)
OTOH I don't love the idea of moving the grace alarm to the privsep
child, since it's intended not to be trustworthy. Other options include
implementing LoginGraceTime in the monitor mainloop or having the
listener do the logging (AFAIK it's still around at this point for
MaxStartups tracking)
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 01:53 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #6 from Darren Tucker <dtucker at dtucker.net> --- (In reply to Damien Miller from comment #5)> Other options > include implementing LoginGraceTime in the monitor mainloopThat's non-trivial since some of the potential timeouts are prior to the monitor mainloop eg kex_exchange_identification().> or having the listener do the logging (AFAIK it's still around at this > point for MaxStartups tracking)That should be doable with a bit of plumbing, the only caveat I can think of is that the timeout log messages will come from a pid not directly associated with the connection. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-04 06:58 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3711| |ok?(dtucker at dtucker.net)
Flags| |
--- Comment #7 from Damien Miller <djm at mindrot.org> ---
Created attachment 3711
--> https://bugzilla.mindrot.org/attachment.cgi?id=3711&action=edit
implement LoginGraceTime logging in listener
This removes the sigdie() in sshd.c and implements the LoginGraceTime
logging in the listener process. E.g.
Timeout before authentication for connection from [10.40.0.253]:2222 to
[172.30.30.4]:51846 (pid = 28473)
It also implements infrastructure for logging abnormal terminations
(e.g. the monitor exiting with SIGSEGV) and adds a framework for custom
handling of arbitrary monitor exit statuses that we can reuse to remove
the auth-pam.c sigdie() calls.
Oh, and it adds a SIGINFO handler to the listener for debugging :)
Also at https://github.com/djmdjm/openssh-wip/pull/22
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-07 05:16 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3711|ok?(dtucker at dtucker.net) |
Flags| |
Attachment #3711|0 |1
is obsolete| |
Attachment #3714| |ok?(dtucker at dtucker.net)
Flags| |
--- Comment #8 from Damien Miller <djm at mindrot.org> ---
Created attachment 3714
--> https://bugzilla.mindrot.org/attachment.cgi?id=3714&action=edit
Fixed diff
Revised diff that fixes a couple of logic errors and simplifies some
code.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-07 06:47 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3714|ok?(dtucker at dtucker.net) |
Flags| |
--- Comment #9 from Damien Miller <djm at mindrot.org> ---
Comment on attachment 3714
--> https://bugzilla.mindrot.org/attachment.cgi?id=3714
Fixed diff
actually, this diff has a big problem too: because it tracks all child
processes in the same structure, and because the tracking logic is
incorrect, it limits the _total_ number of concurrent sessions to
MaxSessions and not just the number of _authenticating_ sessions :(
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 03:22 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3714|0 |1
is obsolete| |
Attachment #3715| |ok?(dtucker at dtucker.net)
Flags| |
--- Comment #10 from Damien Miller <djm at mindrot.org> ---
Created attachment 3715
--> https://bugzilla.mindrot.org/attachment.cgi?id=3715&action=edit
Really fixed diff
This should fix the problems in the previous diff and simplifies things
a little more.
Child processes now signal that authentication was successful back to
the listener, so it can stop tracking them. They do this by sending
another char over the startup_pipe, in addition to the first one they
send to signal they have received their rexec state. When the listener
is so notified, it stops caring about the subprocess and frees up its
slot so it doesn't count against MaxStartups.
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 04:50 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #11 from Darren Tucker <dtucker at dtucker.net> --- Comment on attachment 3715 --> https://bugzilla.mindrot.org/attachment.cgi?id=3715 Really fixed diff>From 9f895491cc6a671fc49b9cda78edfe3801b0af74 Mon Sep 17 00:00:00 2001 >From: Damien Miller <djm at mindrot.org> >Date: Fri, 4 Aug 2023 14:51:03 +1000 >Subject: [PATCH] logging of monitor process exits in listenerThis seems like a bit too large of a change to go in so close to a release? -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Aug-08 05:07 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #12 from Damien Miller <djm at mindrot.org> ---> This seems like a bit too large of a change to go in so close to a release?oh sure, not proposing this for 9.4 but afterwards -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-13 11:01 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |546554688 at qq.com
--- Comment #13 from Damien Miller <djm at mindrot.org> ---
*** Bug 3690 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 10:50 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
linker <546554688 at qq.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|9.1p1 |8.5p1
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 12:09 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #14 from linker <546554688 at qq.com> --- "When can it be merged into the master repository?" -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-May-15 20:18 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|8.5p1 |-current
--- Comment #15 from Damien Miller <djm at mindrot.org> ---
Soon hopefully, I'd like it in before the next release. People testing
it would help
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Jul-01 12:05 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
github at kalvdans.no-ip.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |github at kalvdans.no-ip.org
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Jul-01 19:52 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Alan D. Salewski <salewski at att.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |salewski at att.net
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2024-Dec-06 16:41 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #16 from Damien Miller <djm at mindrot.org> ---
This was committed in openssh-9.8
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2024-Dec-07 19:23 UTC
[Bug 3598] Dead lock of sshd and Defunct of sshd
https://bugzilla.mindrot.org/show_bug.cgi?id=3598 --- Comment #17 from github at kalvdans.no-ip.org --- For reference, I think it was https://github.com/openssh/openssh-portable/commit/81c1099d22b81ebfd20a334ce986c4f753b0db29 -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
Possibly Parallel Threads
- [Bug 3690] New: sshd: root [priv] process sleeping leads to unprivileged child proc zombie
- [Bug 1363] New: sshd gets stuck: select() in packet_read_seqnr waits indefinitely
- [Bug 2143] New: X11 forwarding for ipv4 is broken when ipv6 is disabled on the loopback interface
- [Bug 1180] Add finer-grained controls to sshd
- getpgrp