bugzilla-daemon at mindrot.org
2022-Mar-14 11:18 UTC
[Bug 3405] New: clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Bug ID: 3405 Summary: clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2}) Product: Portable OpenSSH Version: 8.9p1 Hardware: amd64 OS: Linux Status: NEW Severity: minor Priority: P5 Component: ssh Assignee: unassigned-bugs at mindrot.org Reporter: nathanmonfils at gmail.com Hi, This is a bit of an edge-case (and I've already found a workaround using `exec tee`), but since updating to the latest release I've had 100 % CPU usage on a script, which you can replicate with `ssh <host> 2> >({exec 1>&2})`. (I'm actually redirecting stderr to a shell function that parses it for a while then gives up on it by doing `exec 1>&2`). After a few seconds, the ssh process starts using 100 % CPU. Using GDB, I consistently get the following trace: <SNIP> #1 0x0000556af8a302e1 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:39 #2 client_wait_until_can_do_something (conn_out_readyp=<synthetic pointer>, conn_in_readyp=<synthetic pointer>, rekeying=<optimized out>, npfd_activep=0x7ffe8c32d07c, npfd_allocp=0x7ffe8c32d078, pfdp=0x7ffe8c32d080, ssh=0x556af964bf80) at clientloop.c:575 <SNIP> I'm guessing this is linked to OpenBSD-Commit a77e16a667d5b194dcdb3b76308b8bba7fa7239c "upstream: convert ssh, sshd mainloops from select() to poll();". -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-14 23:00 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |djm at mindrot.org --- Comment #1 from Damien Miller <djm at mindrot.org> --- I can't get this to replicate. What's the full command you're using? `ssh <host> 2> >({exec 1>&2})` doesn't include a command that makes any output -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-15 06:31 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned-bugs at mindrot.org |djm at mindrot.org Status|NEW |ASSIGNED --- Comment #2 from Damien Miller <djm at mindrot.org> --- Created attachment 3581 --> https://bugzilla.mindrot.org/attachment.cgi?id=3581&action=edit avoid polling fds we're not ready to service events on I did get it to reproduce, I was just holding it wrong. Thanks for the report and the reproduction instructions. Here's my analysis and a fix is attached:> I think the sequence of operations is something like: > > 1. Channel opens, stderr is attached > 2. Stderr goes away > 3. We poll with pfd[stderr].events including POLLOUT > 4. We get back POLLHUP (POLLERR on Linux) > 5. channel_handle_efd_write() sees sshbuf_len(c->extended)==0, returns > 6. GOTO 3 forever > > One problem is that there is no way to propagate a POLLHUP condition > back to a writable channel when there is no output pending. > > A more fundamental problem is step 3, where we unconditionally included > the fd in the poll array, regardless of whether the channel code had any > intention of attempting a write() later. When I was doing the conversion > from select(), I kept the fds in there because it made the matching of > pfd entries and channels easier but this was probably a mistake. > > The patch corrects the mistake (hopefully). It will only > set up a pollfd entry if IO was requested for the fd, and as a > consequence, should avoid spurious events. It also forced me to make > the pollfd bookkeeping less brittle :) > > PS. an alternate approach would be to leave the pollfd entries in the > array and find a way to handle POLLERR/POLLHUP events. I have a diff > that implements this approach too, by treating such a condition on > a channel that wasn't ready to write something as an automatic write > failure. I shelved this in favour of this approach.-- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-30 21:26 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |harri at afaics.de --- Comment #3 from Damien Miller <djm at mindrot.org> --- *** Bug 3411 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-30 21:27 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |3395 Status|ASSIGNED |RESOLVED Resolution|--- |FIXED --- Comment #4 from Damien Miller <djm at mindrot.org> --- This has been fixed in git head and will be in the openssh-9.0 release that is due very soon. commit d6556de1db0822c76ba2745cf5c097d9472adf7c Author: djm at openbsd.org <djm at openbsd.org> Date: Wed Mar 30 21:10:25 2022 +0000 upstream: fix poll() spin when a channel's output fd closes without data in the channel buffer. Introduce more exact packing of channel fds into the pollfd array. fixes bz3405 and bz3411; ok deraadt@ markus@ OpenBSD-Commit-ID: 06740737849c9047785622ad5d472cb6a3907d10 Referenced Bugs: https://bugzilla.mindrot.org/show_bug.cgi?id=3395 [Bug 3395] Tracking bug for openssh-9.0 -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Apr-08 02:12 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED --- Comment #5 from Damien Miller <djm at mindrot.org> --- closing bug resolved during openssh-9.0 release cycle -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.