bugzilla-daemon at mindrot.org
2021-Apr-24 01:20 UTC
[Bug 3304] New: SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Bug ID: 3304
Summary: SSH client MUX to multiple hosts causes select: Bad
file descriptor
Product: Portable OpenSSH
Version: 8.5p1
Hardware: amd64
OS: Linux
Status: NEW
Severity: normal
Priority: P5
Component: ssh
Assignee: unassigned-bugs at mindrot.org
Reporter: openssh-bugzilla at erik.ca
Created attachment 3499
--> https://bugzilla.mindrot.org/attachment.cgi?id=3499&action=edit
OpenSSH client strace output
Hello,
We encountered an issue with the ssh client (even version 8.5p1) where
it tries to select() a closed file descriptor resulting in a failure
and the control master socket is closed. The issue occurs when we
connect to multiple target hosts (~ 100 hosts) through an SSH bastion
server (using ProxyJump) and issue a command to each target host (Eg.
'id'). We consistently encounter the following error with one of the
*read* file descriptors on a MUX channel:
select: Bad file descriptor
Tested the following versions on Debian 10 (identical results):
OpenSSH 7.9p1 (latest Debian 10 package)
OpenSHS 8.5p1 (github manual build)
Client configuration:
# Bastion: Persistent Socket and SOCKS Proxy
Host my-bastion
User myuser
ProxyJump none
ControlMaster auto
ControlPersist 28800s
ControlPath ~/.ssh/my-bastion.sock
DynamicForward 127.0.0.1:1080
ExitOnForwardFailure yes
HostName my-bastion1.mydomain.com
# Jump via Bastion for those hosts
Host *.mydomain.com
ProxyJump my-bastion
# Catch all
Host *
User root
SendEnv LANG LC_*
AddKeysToAgent yes
ForwardAgent yes
TCPKeepAlive yes
ServerAliveCountMax 3
ServerAliveInterval 20
AddressFamily inet
Build:
(See openssh-build.txt attachment)
Steps to reproduce:
# Create a connection to the bastion (debug level 3 logging), exit
(socket is still present on client), strace the ssh pid attached to the
bastion socket on client host:
ssh -vvv -E ssh.log my-bastion
exit
# myuser 14510 0.5 0.0 16256 2660 ? Ss 00:25 0:00 ssh:
/home/myuser/.ssh/my-bastion.sock [mux]
strace -f -s 2048 -o strace.txt -p 14510
# separate terminal
ANSIBLE_SSH_ARGS= ansible -i my_target_hosts all -a id
When the client attempts to select the closed file descriptor for a MUX
channel, the end result is the control master socket is closed and
unlinked. I will attach files for:
* source locations of both the close() and select()
* ssh logs
* strace output
Let me know if you need any additional info.
Much appreciated,
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:22 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #1 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3500 --> https://bugzilla.mindrot.org/attachment.cgi?id=3500&action=edit OpenSSH client log -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #2 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3501 --> https://bugzilla.mindrot.org/attachment.cgi?id=3501&action=edit OpenSSH source files -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:24 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #3 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3502 --> https://bugzilla.mindrot.org/attachment.cgi?id=3502&action=edit OpenSSH build steps -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-24 01:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
E B <openssh-bugzilla at erik.ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |openssh-bugzilla at erik.ca
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-30 05:14 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |djm at mindrot.org
--- Comment #4 from Damien Miller <djm at mindrot.org> ---
There isn't quite enough debug output there to figure out what is going
wrong and I'm not able to replicate this locally (w/ 40 concurrent jobs
each making 100 multiplexed connections).
Could you attach a complete client debug output (ssh -vvv ...) for both
the main multiplex process and the failing passenger process? Likewise,
more complete strace output would be helpful.
Please use OpenSSH if possible as I just added a bit more debugging
(commit f068930635) that might help figure out what is going wrong.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2021-May-04 01:21 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #5 from E B <openssh-bugzilla at erik.ca> --- Thanks Damien, I will re-run the test with another build (using commit f068930635) and will try to collect & provide additional logging. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:23 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #6 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3515 --> https://bugzilla.mindrot.org/attachment.cgi?id=3515&action=edit Full OpenSSH_8.6p1 MUX proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:26 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #7 from E B <openssh-bugzilla at erik.ca> --- Created attachment 3516 --> https://bugzilla.mindrot.org/attachment.cgi?id=3516&action=edit Full OpenSSH_8.6p1 ansible proc log -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:38 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #8 from E B <openssh-bugzilla at erik.ca> --- Apologies for the latent response, I am able to reproduce this issue on every attempt with OpenSSH 8.6p1 (commit f068930635). I have attached the full ssh log output for both the MUX process and the ansible / ssh processes running through the MUX connection to the bastion host. Full OpenSSH_8.6p1 MUX proc log Full OpenSSH_8.6p1 ansible proc log (gzip) I used the same steps outlined in the original comment with the exception where extra logging was enabled on the ansible side: ANSIBLE_SSH_ARGS="-vvv -E ./ssh.log" ansible -i my_target_hosts all -a id Let me know whether you would also need the full strace output or whether the logs above will suffice. Thanks -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-11 19:50 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
E B <openssh-bugzilla at erik.ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #3515|0 |1
is obsolete| |
--- Comment #9 from E B <openssh-bugzilla at erik.ca> ---
Created attachment 3517
--> https://bugzilla.mindrot.org/attachment.cgi?id=3517&action=edit
Full OpenSSH_8.6p1 MUX proc log
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-May-12 21:45 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304 --- Comment #10 from Damien Miller <djm at mindrot.org> --- Created attachment 3518 --> https://bugzilla.mindrot.org/attachment.cgi?id=3518&action=edit debug select failures Unfortunately, it's hard to figure out what is going on there without the actual bad file descriptor. Sorry to be a bother, but are you able to reproduce using git HEAD with this patch applied? It includes some extra debugging that will let us determine the sequence of events, and will log which file descriptors are bad after select fails. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2022-Jan-14 04:16 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |WORKSFORME
Status|NEW |RESOLVED
--- Comment #11 from Damien Miller <djm at mindrot.org> ---
Closing for lack of followup.
OpenSSH HEAD has replaced the use of select() with poll(). Please try
HEAD or OpenSSH 8.9 when it is released as it might fix the problem
you're seeing.
If not, then I recommend setting the DEBUG_CHANNEL_POLL #define at the
start of channels.c and attaching the debug output. poll(2) is easier
to debug than select(2), because it will tell you which fd is bad via
POLLNVAL and we do log this information
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2022-Feb-25 02:58 UTC
[Bug 3304] SSH client MUX to multiple hosts causes select: Bad file descriptor
https://bugzilla.mindrot.org/show_bug.cgi?id=3304
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED
--- Comment #12 from Damien Miller <djm at mindrot.org> ---
closing bugs resolved before openssh-8.9
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.