bugzilla-daemon at mindrot.org
2021-Apr-24 01:20 UTC
[Bug 3304] New: SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 Bug ID: 3304 Summary: SSH client MUX to multiple hosts causes select: Bad file descriptor Product: Portable OpenSSH Version: 8.5p1 Hardware: amd64 OS: Linux Status: NEW Severity: normal Priority: P5 Component: ssh Assignee: unassigned-bugs at mindrot.org Reporter: openssh-bugzilla at erik.ca Created attachment 3499 --> https://bugzilla.mindrot.org/attachment.cgi?id=3499&action=edit OpenSSH client strace output Hello, We encountered an issue with the ssh client (even version 8.5p1) where it tries to select() a closed file descriptor resulting in a failure and the control master socket is closed. The issue occurs when we connect to multiple target hosts (~ 100 hosts) through an SSH bastion server (using ProxyJump) and issue a command to each target host (Eg. 'id'). We consistently encounter the following error with one of the *read* file descriptors on a MUX channel: select: Bad file descriptor Tested the following versions on Debian 10 (identical results): OpenSSH 7.9p1 (latest Debian 10 package) OpenSHS 8.5p1 (github manual build) Client configuration: # Bastion: Persistent Socket and SOCKS Proxy Host my-bastion User myuser ProxyJump none ControlMaster auto ControlPersist 28800s ControlPath ~/.ssh/my-bastion.sock DynamicForward 127.0.0.1:1080 ExitOnForwardFailure yes HostName my-bastion1.mydomain.com # Jump via Bastion for those hosts Host *.mydomain.com ProxyJump my-bastion # Catch all Host * User root SendEnv LANG LC_* AddKeysToAgent yes ForwardAgent yes TCPKeepAlive yes ServerAliveCountMax 3 ServerAliveInterval 20 AddressFamily inet Build: (See openssh-build.txt attachment) Steps to reproduce: # Create a connection to the bastion (debug level 3 logging), exit (socket is still present on client), strace the ssh pid attached to the bastion socket on client host: ssh -vvv -E ssh.log my-bastion exit # myuser 14510 0.5 0.0 16256 2660 ? Ss 00:25 0:00 ssh: /home/myuser/.ssh/my-bastion.sock [mux] strace -f -s 2048 -o strace.txt -p 14510 # separate terminal ANSIBLE_SSH_ARGS= ansible -i my_target_hosts all -a id When the client attempts to select the closed file descriptor for a MUX channel, the end result is the control master socket is closed and unlinked. I will attach files for: * source locations of both the close() and select() * ssh logs * strace output Let me know if you need any additional info. Much appreciated, -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:22 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #1 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3500 --> https://bugzilla.mindrot.org/attachment.cgi?id=3500&action=edit OpenSSH client log -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #2 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3501 --> https://bugzilla.mindrot.org/attachment.cgi?id=3501&action=edit OpenSSH source files -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:24 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #3 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3502 --> https://bugzilla.mindrot.org/attachment.cgi?id=3502&action=edit OpenSSH build steps -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 E B <openssh-bugzilla at erik.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |openssh-bugzilla at erik.ca -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-30 05:14 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |djm at mindrot.org --- Comment #4 from Damien Miller <djm at mindrot.org> --- There isn't quite enough debug output there to figure out what is going wrong and I'm not able to replicate this locally (w/ 40 concurrent jobs each making 100 multiplexed connections). Could you attach a complete client debug output (ssh -vvv ...) for both the main multiplex process and the failing passenger process? Likewise, more complete strace output would be helpful. Please use OpenSSH if possible as I just added a bit more debugging (commit f068930635) that might help figure out what is going wrong. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-May-04 01:21 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #5 from E B <openssh-bugzilla at erik.ca> --- Thanks Damien, I will re-run the test with another build (using commit f068930635) and will try to collect & provide additional logging. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #6 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3515 --> https://bugzilla.mindrot.org/attachment.cgi?id=3515&action=edit Full OpenSSH_8.6p1 MUX proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #7 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3516 --> https://bugzilla.mindrot.org/attachment.cgi?id=3516&action=edit Full OpenSSH_8.6p1 ansible proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:38 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #8 from E B <openssh-bugzilla at erik.ca> --- Apologies for the latent response, I am able to reproduce this issue on every attempt with OpenSSH 8.6p1 (commit f068930635). I have attached the full ssh log output for both the MUX process and the ansible / ssh processes running through the MUX connection to the bastion host. Full OpenSSH_8.6p1 MUX proc log Full OpenSSH_8.6p1 ansible proc log (gzip) I used the same steps outlined in the original comment with the exception where extra logging was enabled on the ansible side: ANSIBLE_SSH_ARGS="-vvv -E ./ssh.log" ansible -i my_target_hosts all -a id Let me know whether you would also need the full strace output or whether the logs above will suffice. Thanks -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:50 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 E B <openssh-bugzilla at erik.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #3515|0 |1 is obsolete| | --- Comment #9 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3517 --> https://bugzilla.mindrot.org/attachment.cgi?id=3517&action=edit Full OpenSSH_8.6p1 MUX proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-12 21:45 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #10 from Damien Miller <djm at mindrot.org> --- Created attachment 3518 --> https://bugzilla.mindrot.org/attachment.cgi?id=3518&action=edit debug select failures Unfortunately, it's hard to figure out what is going on there without the actual bad file descriptor. Sorry to be a bother, but are you able to reproduce using git HEAD with this patch applied? It includes some extra debugging that will let us determine the sequence of events, and will log which file descriptors are bad after select fails. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2022-Jan-14 04:16 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |WORKSFORME Status|NEW |RESOLVED --- Comment #11 from Damien Miller <djm at mindrot.org> --- Closing for lack of followup. OpenSSH HEAD has replaced the use of select() with poll(). Please try HEAD or OpenSSH 8.9 when it is released as it might fix the problem you're seeing. If not, then I recommend setting the DEBUG_CHANNEL_POLL #define at the start of channels.c and attaching the debug output. poll(2) is easier to debug than select(2), because it will tell you which fd is bad via POLLNVAL and we do log this information -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Feb-25 02:58 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED --- Comment #12 from Damien Miller <djm at mindrot.org> --- closing bugs resolved before openssh-8.9 -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.