bugzilla-daemon at mindrot.org
2022-Mar-14 11:18 UTC
[Bug 3405] New: clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Bug ID: 3405
Summary: clientloop's client_wait_until_can_do_something uses
100 % CPU with ssh 2> >({exec 1>&2})
Product: Portable OpenSSH
Version: 8.9p1
Hardware: amd64
OS: Linux
Status: NEW
Severity: minor
Priority: P5
Component: ssh
Assignee: unassigned-bugs at mindrot.org
Reporter: nathanmonfils at gmail.com
Hi,
This is a bit of an edge-case (and I've already found a workaround
using `exec tee`), but since updating to the latest release I've had
100 % CPU usage on a script, which you can replicate with `ssh <host>
2> >({exec 1>&2})`. (I'm actually redirecting stderr to a shell
function that parses it for a while then gives up on it by doing `exec
1>&2`).
After a few seconds, the ssh process starts using 100 % CPU. Using GDB,
I consistently get the following trace:
<SNIP>
#1 0x0000556af8a302e1 in poll (__timeout=<optimized out>,
__nfds=<optimized out>, __fds=<optimized out>)
at /usr/include/bits/poll2.h:39
#2 client_wait_until_can_do_something (conn_out_readyp=<synthetic
pointer>, conn_in_readyp=<synthetic pointer>,
rekeying=<optimized out>, npfd_activep=0x7ffe8c32d07c,
npfd_allocp=0x7ffe8c32d078, pfdp=0x7ffe8c32d080, ssh=0x556af964bf80)
at clientloop.c:575
<SNIP>
I'm guessing this is linked to OpenBSD-Commit
a77e16a667d5b194dcdb3b76308b8bba7fa7239c "upstream: convert ssh, sshd
mainloops from select() to poll();".
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-14 23:00 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |djm at mindrot.org
--- Comment #1 from Damien Miller <djm at mindrot.org> ---
I can't get this to replicate. What's the full command you're using?
`ssh <host> 2> >({exec 1>&2})` doesn't include a command
that makes any
output
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-15 06:31 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned-bugs at mindrot.org |djm at mindrot.org
Status|NEW |ASSIGNED
--- Comment #2 from Damien Miller <djm at mindrot.org> ---
Created attachment 3581
--> https://bugzilla.mindrot.org/attachment.cgi?id=3581&action=edit
avoid polling fds we're not ready to service events on
I did get it to reproduce, I was just holding it wrong.
Thanks for the report and the reproduction instructions. Here's my
analysis and a fix is attached:
> I think the sequence of operations is something like:
>
> 1. Channel opens, stderr is attached
> 2. Stderr goes away
> 3. We poll with pfd[stderr].events including POLLOUT
> 4. We get back POLLHUP (POLLERR on Linux)
> 5. channel_handle_efd_write() sees sshbuf_len(c->extended)==0, returns
> 6. GOTO 3 forever
>
> One problem is that there is no way to propagate a POLLHUP condition
> back to a writable channel when there is no output pending.
>
> A more fundamental problem is step 3, where we unconditionally included
> the fd in the poll array, regardless of whether the channel code had any
> intention of attempting a write() later. When I was doing the conversion
> from select(), I kept the fds in there because it made the matching of
> pfd entries and channels easier but this was probably a mistake.
>
> The patch corrects the mistake (hopefully). It will only
> set up a pollfd entry if IO was requested for the fd, and as a
> consequence, should avoid spurious events. It also forced me to make
> the pollfd bookkeeping less brittle :)
>
> PS. an alternate approach would be to leave the pollfd entries in the
> array and find a way to handle POLLERR/POLLHUP events. I have a diff
> that implements this approach too, by treating such a condition on
> a channel that wasn't ready to write something as an automatic write
> failure. I shelved this in favour of this approach.
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-30 21:26 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |harri at afaics.de
--- Comment #3 from Damien Miller <djm at mindrot.org> ---
*** Bug 3411 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Mar-30 21:27 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |3395
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #4 from Damien Miller <djm at mindrot.org> ---
This has been fixed in git head and will be in the openssh-9.0 release
that is due very soon.
commit d6556de1db0822c76ba2745cf5c097d9472adf7c
Author: djm at openbsd.org <djm at openbsd.org>
Date: Wed Mar 30 21:10:25 2022 +0000
upstream: fix poll() spin when a channel's output fd closes without
data in the channel buffer. Introduce more exact packing of channel
fds into
the pollfd array. fixes bz3405 and bz3411; ok deraadt@ markus@
OpenBSD-Commit-ID: 06740737849c9047785622ad5d472cb6a3907d10
Referenced Bugs:
https://bugzilla.mindrot.org/show_bug.cgi?id=3395
[Bug 3395] Tracking bug for openssh-9.0
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Apr-08 02:12 UTC
[Bug 3405] clientloop's client_wait_until_can_do_something uses 100 % CPU with ssh 2> >({exec 1>&2})
https://bugzilla.mindrot.org/show_bug.cgi?id=3405
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED
--- Comment #5 from Damien Miller <djm at mindrot.org> ---
closing bug resolved during openssh-9.0 release cycle
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.