bugzilla-daemon at bugzilla.mindrot.org
2019-Sep-06 18:32 UTC
[Bug 3067] New: Fails to unlink ControlMaster socket early enough, confuses other clients
https://bugzilla.mindrot.org/show_bug.cgi?id=3067 Bug ID: 3067 Summary: Fails to unlink ControlMaster socket early enough, confuses other clients Product: Portable OpenSSH Version: 7.9p1 Hardware: Other OS: Linux Status: NEW Severity: normal Priority: P5 Component: ssh Assignee: unassigned-bugs at mindrot.org Reporter: leonerd at leonerd.org.uk (from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877020) TL;DR: ssh(1) must unlink local socket _before_ attempting more network traffic otherwise broken TCP sockets will stall the entire thing. - I make heavy use of the shared control sockets to multiplex multiple shells, sftp, and other commands down a single TCP connection to remote servers. ControlPath ~/var/run/ssh-master-%r@%h:%p.sock ControlPersist 1s ControlMaster auto In this setup, under stable networking all works nicely. However, my machine is a laptop, and sometimes due to mobile data, wifi, ethernet cable swapping, or other issues my IP address and hence routing change. After such a change, all existing TCP sockets are now unuseable and must be closed and reopened. Simply closing all ssh clients is insufficient here, because the client tries to perform a controlled shutdown of the TCP socket *first* and will only unlink(2) the control master socket from the local filesystem after it has done this. By ordering the operations thus, the client stalls trying to perform this controlled TCP shutdown over now-invalid networking, and never gets around to removing the local unix socket. New ssh clients would try to use this and similarly stall. The correct order of operation ought to be that the control master local socket is unlinked *before* trying to send any traffic, thus restoring the user's "turn it off and on again" approach to fixing the problem - namely by just killing all their clients and making a new one. --- Additionally I should add; a workaround for this is to simply $ rm ~/var/run/ssh-master-*.sock after closing the previous master clients, before starting them up afresh. The lack of existing unix socket causes the first one to take master control again and all resumes fine. -- You are receiving this mail because: You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2020-Jul-10 04:01 UTC
[Bug 3067] Fails to unlink ControlMaster socket early enough, confuses other clients
https://bugzilla.mindrot.org/show_bug.cgi?id=3067 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |djm at mindrot.org --- Comment #1 from Damien Miller <djm at mindrot.org> --- I think you have two options here: First, you can explicitly shut down multiplexed sessions that have stalled underlying TCP connection. E.g. I use for x in ~/.ssh/ctl-* ; do test -r "$x" && ssh -Fnone -Ostop -oControlPath="$x" dummy ; done Second, you can set a protocol-level health check to automatically kill unresponsive sessions. E.g. adding the following to ~/.ssh/config ServerAliveInterval 2m ServerAliveCountMax 3 Will terminate any unresponsive connection after six minutes. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2023-Oct-11 06:33 UTC
[Bug 3067] Fails to unlink ControlMaster socket early enough, confuses other clients
https://bugzilla.mindrot.org/show_bug.cgi?id=3067 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |WORKSFORME Status|NEW |RESOLVED --- Comment #2 from Damien Miller <djm at mindrot.org> --- closing for lack of followup -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at mindrot.org
2023-Oct-11 09:52 UTC
[Bug 3067] Fails to unlink ControlMaster socket early enough, confuses other clients
https://bugzilla.mindrot.org/show_bug.cgi?id=3067 --- Comment #3 from Paul Evans <leonerd at leonerd.org.uk> --- Ah - I didn't follow up because those both sound like other workarounds, of a similar nature to the `rm` based one I already suggested. Those require the user to actively do something to reset the problem. My suggested fix, of performing unlink() on the control socket before attempting shutdown() on the TCP connection, requires no further action on the part of the user. It just transparently works. I therefore don't consider this to be a solution. -- You are receiving this mail because: You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
Reasonably Related Threads
- race condition with ControlMaster=auto
- [Bug 3743] New: ControlMaster forces ForkAfterAuthentication even when not wanted, and can't be disabled
- ControlMaster, scp and current working directory
- [Bug 2437] New: ssh with ControlMaster and ControlPath hangs on 2nd session in same terminal
- [Bug 1938] New: EscapeChar sometimes don't work when using ControlMaster