bugzilla-daemon at mindrot.org
2021-Apr-24  01:20 UTC
[Bug 3304] New: SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
            Bug ID: 3304
           Summary: SSH client MUX to multiple hosts causes select: Bad
                    file descriptor
           Product: Portable OpenSSH
           Version: 8.5p1
          Hardware: amd64
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: ssh
          Assignee: unassigned-bugs at mindrot.org
          Reporter: openssh-bugzilla at erik.ca
Created attachment 3499
  --> https://bugzilla.mindrot.org/attachment.cgi?id=3499&action=edit
OpenSSH client strace output
Hello,
We encountered an issue with the ssh client (even version 8.5p1) where
it tries to select() a closed file descriptor resulting in a failure
and the control master socket is closed.  The issue occurs when we
connect to multiple target hosts (~ 100 hosts) through an SSH bastion
server (using ProxyJump) and issue a command to each target host (Eg.
'id'). We consistently encounter the following error with one of the
*read* file descriptors on a MUX channel:
select: Bad file descriptor
Tested the following versions on Debian 10 (identical results):
OpenSSH 7.9p1 (latest Debian 10 package)
OpenSHS 8.5p1 (github manual build)
Client configuration:
# Bastion: Persistent Socket and SOCKS Proxy
Host my-bastion
    User myuser
    ProxyJump none
    ControlMaster auto
    ControlPersist 28800s
    ControlPath ~/.ssh/my-bastion.sock
    DynamicForward 127.0.0.1:1080
    ExitOnForwardFailure yes
    HostName my-bastion1.mydomain.com
# Jump via Bastion for those hosts
Host *.mydomain.com
    ProxyJump my-bastion
# Catch all
Host *
    User root
    SendEnv LANG LC_*
    AddKeysToAgent yes
    ForwardAgent yes
    TCPKeepAlive yes
    ServerAliveCountMax 3
    ServerAliveInterval 20
    AddressFamily inet
Build:
(See openssh-build.txt attachment)
Steps to reproduce:
# Create a connection to the bastion (debug level 3 logging), exit
(socket is still present on client), strace the ssh pid attached to the
bastion socket on client host:
ssh -vvv -E ssh.log my-bastion
exit
# myuser 14510  0.5  0.0  16256  2660 ?        Ss   00:25   0:00 ssh:
/home/myuser/.ssh/my-bastion.sock [mux]
strace -f -s 2048 -o strace.txt -p 14510
# separate terminal
ANSIBLE_SSH_ARGS= ansible -i my_target_hosts all -a id
When the client attempts to select the closed file descriptor for a MUX
channel, the end result is the control master socket is closed and
unlinked. I will attach files for:
* source locations of both the close() and select()
* ssh logs
* strace output
Let me know if you need any additional info.
Much appreciated,
-- 
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24  01:22 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #1 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3500 --> https://bugzilla.mindrot.org/attachment.cgi?id=3500&action=edit OpenSSH client log -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24  01:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #2 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3501 --> https://bugzilla.mindrot.org/attachment.cgi?id=3501&action=edit OpenSSH source files -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24  01:24 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #3 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3502 --> https://bugzilla.mindrot.org/attachment.cgi?id=3502&action=edit OpenSSH build steps -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24  01:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
E B <openssh-bugzilla at erik.ca> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |openssh-bugzilla at erik.ca
-- 
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-30  05:14 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |djm at mindrot.org
--- Comment #4 from Damien Miller <djm at mindrot.org> ---
There isn't quite enough debug output there to figure out what is going
wrong and I'm not able to replicate this locally (w/ 40 concurrent jobs
each making 100 multiplexed connections).
Could you attach a complete client debug output (ssh -vvv ...) for both
the main multiplex process and the failing passenger process? Likewise,
more complete strace output would be helpful.
Please use OpenSSH if possible as I just added a bit more debugging
(commit f068930635) that might help figure out what is going wrong.
-- 
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-May-04  01:21 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #5 from E B <openssh-bugzilla at erik.ca> --- Thanks Damien, I will re-run the test with another build (using commit f068930635) and will try to collect & provide additional logging. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11  19:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #6 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3515 --> https://bugzilla.mindrot.org/attachment.cgi?id=3515&action=edit Full OpenSSH_8.6p1 MUX proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11  19:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #7 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3516 --> https://bugzilla.mindrot.org/attachment.cgi?id=3516&action=edit Full OpenSSH_8.6p1 ansible proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11  19:38 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #8 from E B <openssh-bugzilla at erik.ca> --- Apologies for the latent response, I am able to reproduce this issue on every attempt with OpenSSH 8.6p1 (commit f068930635). I have attached the full ssh log output for both the MUX process and the ansible / ssh processes running through the MUX connection to the bastion host. Full OpenSSH_8.6p1 MUX proc log Full OpenSSH_8.6p1 ansible proc log (gzip) I used the same steps outlined in the original comment with the exception where extra logging was enabled on the ansible side: ANSIBLE_SSH_ARGS="-vvv -E ./ssh.log" ansible -i my_target_hosts all -a id Let me know whether you would also need the full strace output or whether the logs above will suffice. Thanks -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11  19:50 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
E B <openssh-bugzilla at erik.ca> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #3515|0                           |1
        is obsolete|                            |
--- Comment #9 from E B <openssh-bugzilla at erik.ca> ---
Created attachment 3517
  --> https://bugzilla.mindrot.org/attachment.cgi?id=3517&action=edit
Full OpenSSH_8.6p1 MUX proc log
-- 
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-12  21:45 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #10 from Damien Miller <djm at mindrot.org> --- Created attachment 3518 --> https://bugzilla.mindrot.org/attachment.cgi?id=3518&action=edit debug select failures Unfortunately, it's hard to figure out what is going on there without the actual bad file descriptor. Sorry to be a bother, but are you able to reproduce using git HEAD with this patch applied? It includes some extra debugging that will let us determine the sequence of events, and will log which file descriptors are bad after select fails. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2022-Jan-14  04:16 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WORKSFORME
             Status|NEW                         |RESOLVED
--- Comment #11 from Damien Miller <djm at mindrot.org> ---
Closing for lack of followup.
OpenSSH HEAD has replaced the use of select() with poll(). Please try
HEAD or OpenSSH 8.9 when it is released as it might fix the problem
you're seeing.
If not, then I recommend setting the DEBUG_CHANNEL_POLL #define at the
start of channels.c and attaching the debug output. poll(2) is easier
to debug than select(2), because it will tell you which fd is bad via
POLLNVAL and we do log this information
-- 
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Feb-25  02:58 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED
--- Comment #12 from Damien Miller <djm at mindrot.org> ---
closing bugs resolved before openssh-8.9
-- 
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.