Hi, I've noticed some ssh behaviour that I wish didn't happen. I was wondering if someone can explain how I can stop it from happening, or explain why it's unavoidable. If I ssh-with-agent-forwarding from one host to a second host, and on the second host use something like nohup/screen/tmux/daemon, and from within that new process session, start a long-running command via ssh-without-agent-forwarding on a third host, I would expect to be able to (e.g.) detach from the screen session and log out of the second host, but my shell prompt on the first host doesn't come back and even Ctrl-C won't break the connection between ssh on the first host and sshd on the second host. I have to close the xterm window that the shell and ssh are running in. If I don't do that, the shell prompt doesn't come back until the long-running command on the third host has completed. To see what I mean: - on host1: Have ssh-agent running with an identity loaded - on host1: "xterm &" (start an xterm on similar) - on host1 in xterm: "ssh -A host2" (ssh-with-agent-forwarding to host2) - on host2: "screen" (start a screen session) - on host2 in screen: "ssh -a host3 sleep 60" (long-running cmd on host3) - on host2 in screen: Ctrl-a d (detach from the screen session) - on host2: Ctrl-d (log out of host2) - on host1: wait a long time for the shell prompt to appear or close xterm host1 ssh: OpenSSH_8.1p1, OpenSSL 1.1.1g 21 Apr 2020 host2 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019 host3 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019 In other words, I want the agent to be forwarded to host2, so that I can then ssh from there to host3, but I don't want the agent to be forwarded to host3 because it's not needed there. Note that my real command was rsync so both host2 and host3 were involved. My hypothesis is that agent forwarding has something to do with why the connection between host1 and host2 isn't cleanly closed. Using lsof to compare sshd before and after starting the long-running command on host3, the only difference was this: --- lsof.20786.sshd.before 2020-03-12 09:17:04.000000000 +1100 +++ lsof.20786.sshd.after 2020-03-12 09:18:32.000000000 +1100 @@ -71,5 +71,6 @@ sshd 20786 raf 7w FIFO sshd 20786 raf 8w FIFO 0,10 0t0 14325237 pipe sshd 20786 raf 9u unix 0xffff99a3a8d96000 0t0 14325238 /tmp/ssh-KBbJCuYltB/agent.20786 type=STREAM sshd 20786 raf 10u CHR 5,2 0t0 1119 /dev/ptmx +sshd 20786 raf 11u unix 0xffff99a3e8d2cc00 0t0 14328304 /tmp/ssh-KBbJCuYltB/agent.20786 type=STREAM sshd 20786 raf 12u CHR 5,2 0t0 1119 /dev/ptmx sshd 20786 raf 13u CHR 5,2 0t0 1119 /dev/ptmx i.e. a new connection to the agent socket, even though agent forwarding to host3 was disabled with -a. When I first saw that, I added the -a option to the ssh command to host3 (I have agent forwarding on by config). To my surprise, it didn't change this behaviour, the second connection to the agent socket was still created, and I still had to close the xterm window to break the connection between host1 and host2. Any suggestions? cheers, raf
> Any suggestions?On host2, run screen (or individual commands) without SSH_AUTH_SOCK, so eg. $ SSH_AUTH_SOCK= rsync ...
raf wrote:> I've noticed some ssh behaviour that I wish didn't > happen. I was wondering if someone can explain how I > can stop it from happening, or explain why it's > unavoidable.The problem you are running into is due to using a live network connection from the remote system back to your local system's ssh-agent. If you weren't doing that then you would not run into this problem. If you want to continue to do this then you will need to work around the problem.> If I ssh-with-agent-forwarding from one host to a > second host, and on the second host use something likeThat's the live network connection that you are setting up.> nohup/screen/tmux/daemon, and from within that newAnd those are long running environments. Meaning that if you break the connection to the ssh-agent then it is possible to create situations where things running in the the above environments want to keep talking to the ssh-agent.> process session, start a long-running command via > ssh-without-agent-forwarding on a third host, I would > expect to be able to (e.g.) detach from the screen > session and log out of the second host, but my shellIn order to be able to expect to detach from the session and log out then the environment must not keep any resources open. And by resources here it is the file descriptor that is connected to the network socket that is connected to the ssh-agent. That's open. Meaning a non-zero reference count. Meaning that it does not get closed. Meaning that ssh is going to keep the connection open. "Because someone is using it."> prompt on the first host doesn't come back and even > Ctrl-C won't break the connection between ssh on the > first host and sshd on the second host. I have to close > the xterm window that the shell and ssh are running in.You could also use "Enter ~ ." to forcibly close the connection too. That is a useful command sequence. The ~ is the escape character and is recognized at the beginning of a line. See the manual in the section under "ESCAPE CHARACTERS" for the full description.> If I don't do that, the shell prompt doesn't come back > until the long-running command on the third host has > completed.Correct. That is the correct behavior. The long running command on the remove is holding the file open. Non-zero reference count. When the process exits then it closes the file. Which closes the network connection. Which allows ssh to exit.> To see what I mean: > > - on host1: Have ssh-agent running with an identity loaded > - on host1: "xterm &" (start an xterm on similar)All good.> - on host1 in xterm: "ssh -A host2" (ssh-with-agent-forwarding to host2)At this point my personal opinion is that we should pause and think about why -A might be wanted here. I realize the option exists. I realize that many people use this option a lot. But personally I almost never use that option. I don't need to use that option. That option is just a door that opens to a room filled with a lot of security layer related questions. Which might be okay. Or might be a door I don't want to open. Why are you using -A there? Do you really need to use it? That would be a good discussion point. Because I don't ever have a need for it. Using it means one must trust the remote system not to be malicious. (Which it mostly will not be. But it is possible.) But mostly because the live network connection it sets up is then required to stay available for the full lifecycle. As you have found out. It creates entanglements. It's messy.> - on host2: "screen" (start a screen session)And this sets up a pitfall that might or might not be fallen into. In the screen environment for every shell started within it will be the environment variables from the ssh connection. You will probably see something like this example. rwp at madness:~$ env | grep SSH SSH_AUTH_SOCK=/tmp/ssh-YsdgP0Eexk/agent.14641 SSH_CONNECTION=192.168.230.119 44148 192.168.230.123 22 SSH_CLIENT=192.168.230.119 44148 22 SSH_TTY=/dev/pts/4 The problem is the SSH_AUTH_SOCK which is setting up the connectivity to the ssh-agent on your originating client. If you avoid that then you avoid the problem. I daresay the easiest way to avoid it is to avoid the -A option. But if you must use it then when setting up screen you can remove it from the environment. env -u SSH_AUTH_SOCK screen Here is am using 'env' to unset the variable from the environment. And also 'env' is an idiom for a canonical way to set or clear environment variables regardless of the command line shell that anyone might be using. Because bash, ksh, zsh, csh, and all of those have slightly different syntax. But invoking 'env' this way would be identical in all of them. Which makes it easiest for me to suggest using env in this way and knowing that it will work regardless of the user shell environment. Isn't that pretty nice? :-)> - on host2 in screen: "ssh -a host3 sleep 60" (long-running cmd on host3)And here you are using -a to *avoid* the ssh-agent that was set up with the -A in the originating invocation. Layers and layers! If the originating -A was removed then this -a could be removed. Simplify!> - on host2 in screen: Ctrl-a d (detach from the screen session)But it really can't! Because of the live long running network connection to the ssh-agent. "The cake is a lie!"> - on host2: Ctrl-d (log out of host2)This is not quite a "log out of host2". This is an "exit the command line shell running on host2". The difference is important in this specific case. Because the command line shell will exit. But the command line shell is running under sshd on the remote host. And that is talking to the ssh on the local host. And as described the remote sshd is going to need to keep running waiting for the live network connection to your ssh-agent to close.> - on host1: wait a long time for the shell prompt to appear or close xtermRight. As it should be doing. Due to the use of -A.> In other words, I want the agent to be forwarded to > host2, so that I can then ssh from there to host3, but > I don't want the agent to be forwarded to host3 because > it's not needed there. Note that my real command was > rsync so both host2 and host3 were involved.That is the critical point. I've written a bunch here already. With a lot of opinions embedded! :-) So I will more or less pause for a moment here for reflection. Because everything becomes interconnected and understanding those interconnections will make working with things all make sense. "Know how the system works. Know how to work the system." :-)> My hypothesis is that agent forwarding has something to > do with why the connection between host1 and host2 > isn't cleanly closed.And I believe your hypothesis to be a correct one.> Any suggestions?I am missing some details of your environment and dependencies so this is a potentially bad suggestion. But if I absolutely needed to ssh from host1 to host2 and then absolutely needed to use host2 as a launching point to get to host3 and other places then I would create a unique ssh key on host2 and start an ssh-agent running on host2 using that key. Then use that key to get to host3 and other hosts. There is also a very convenient utility that I hesitate to mention because it also opens a door to a room filled with security questions. It might be fine. It might be unacceptable. Some will yell that they hate my dog because I suggest this. Others will go, well yes, I am using it too. Everything all depends. I would run 'keychain' on host2 so that everytime you log into host2 it reattaches your command line shell environment to an ssh-agent running on host2. Since it seems like you are really using host2 as your main home base of operations. Maybe your originating client is your mobile laptop or something. That's fine. You want to be able to suspend your laptop and then move to another WiFi network and resume and then reconnect. Maybe. That's all fine. I do that routinely. And if host2 is your main base of operations then it would be where I would be running the main ssh-agent that is used to log into the other hosts. Or maybe it is just a local base of operations for a single computer cluster of compute farm machines all being administered together. Same thing. You can read about keychain on the upstream docs. If you are not running Funtoo ignore all of the Funtoo references as keychain has almost certainly been packaged for your OS software distribution. On my system "apt-get install keychain" installs it. It's really just one #!/bin/sh shell script. Very portable. Even if it is not packaged for your OS you can almost certainly use a copy from your home as it is simply a shell script. https://www.funtoo.org/Keychain Then in my .profile I have this code to set it up. This would go on host2's ~/.profile. Or ~/.bash_profile if that is what you are using. Or ~/.zlogin or whatever. Make sure you know your shell's start up environment file and do the right thing. This is for bash or ksh. if (keychain --version) >/dev/null 2>&1; then keychain -q if [ -f $HOME/.keychain/$(hostname)-sh ]; then . $HOME/.keychain/$(hostname)-sh fi fi A newly started ssh-agent on host2 would need ssh-add to be run at least once in order to load your ssh keys into the running agent. But then it will continue to run there even after you log out. However if you flush the keys from the agent or host2 reboots or whatever then you would need to ssh-add again after that point in order to load up ssh keys in that agent. Also you can ssh-add -D to delete identities at any time to prevent it's further use until you add keys back into it. As I read between the lines I think this would be a good solution for you. However that does make some assumptions. I am seeing lines and trying to interpolate between them. I am suggesting this in order to be helpful. But please understand the issues and then make your own decisions. Hope this helps! :-) Bob
On Wed, 3 Jun 2020, Bob Proulx wrote:> You could also use "Enter ~ ." to forcibly close the connection too.Or, when starting the connection from within a GNU screen tab, just press ^Ak to kill the tab.> And also 'env' is an idiom for a canonical way to set or clear > environment variables regardless of the command line shell that anyone > might be using. Because bash, ksh, zsh, csh, and all of those have > slightly different syntax. But invoking 'env' this way would beunportable (e.g. it has no -u on some BSDs), whereas unset in the shells is pretty well understood. bye, //mirabilos -- ?Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund, mksh auf jedem System zu installieren.? -- XTaran auf der OpenRheinRuhr, ganz begeistert (EN: ?[?]uhr.gz is a reason to install mksh on every system.?)
Philipp Marek wrote:> > Any suggestions? > > On host2, run screen (or individual commands) without SSH_AUTH_SOCK, so eg. > > $ SSH_AUTH_SOCK= rsync ...Hi Philipp, Thanks for trying to help but that doesn't work. It prevents the ssh from host2 to host3 from having access to the agent, which it needs initially, but it only needs it long enough to authenticate the connection to host3. The attempt to ssh to host3 fails (because that ssh has no access to the key). I would hope that, once that authentication to host3 has completed, that ssh process would close its connection to the agent because it had been invokved with the -a option, and so the connection is no longer needed. i.e. it doesn't need to be forwarded to host3. I see ssh's failure to close the connection to the agent, once it is no longer needed, as a possible buglet. I was hoping that someone could explain why it needs to keep that connection open. I'm assuming there might be a good reason for it. Or maybe it really is a buglet. If this behaviour could be changed, so that ssh closes its connection to the agent socket when it is no longer needed, it would probably solve my problem automatically. Does that sound reasonable? cheers, raf
Bob Proulx wrote:> raf wrote: > > I've noticed some ssh behaviour that I wish didn't > > happen. I was wondering if someone can explain how I > > can stop it from happening, or explain why it's > > unavoidable. > > The problem you are running into is due to using a live network > connection from the remote system back to your local system's > ssh-agent. If you weren't doing that then you would not run into this > problem. If you want to continue to do this then you will need to > work around the problem. > > > If I ssh-with-agent-forwarding from one host to a > > second host, and on the second host use something like > > That's the live network connection that you are setting up. > > > nohup/screen/tmux/daemon, and from within that new > > And those are long running environments. Meaning that if you break > the connection to the ssh-agent then it is possible to create > situations where things running in the the above environments want to > keep talking to the ssh-agent.After the initial authentication on host2 to host3, there should be no need for further access to the agent (because that ssh was invoked with the -a option).> > process session, start a long-running command via > > ssh-without-agent-forwarding on a third host, I would > > expect to be able to (e.g.) detach from the screen > > session and log out of the second host, but my shell > > In order to be able to expect to detach from the session and log out > then the environment must not keep any resources open. And by > resources here it is the file descriptor that is connected to the > network socket that is connected to the ssh-agent. That's open. > Meaning a non-zero reference count. Meaning that it does not get > closed. Meaning that ssh is going to keep the connection open. > "Because someone is using it."My question is about why sshd on host2 keeps that file descriptor open, or why the ssh on host2 (to host3) keeps it open after authentication to host3 (after which it is no longer needed).> > prompt on the first host doesn't come back and even > > Ctrl-C won't break the connection between ssh on the > > first host and sshd on the second host. I have to close > > the xterm window that the shell and ssh are running in. > > You could also use "Enter ~ ." to forcibly close the connection too. > That is a useful command sequence. The ~ is the escape character and > is recognized at the beginning of a line. See the manual in the > section under "ESCAPE CHARACTERS" for the full description.Thanks but that doesn't work, at least not after logging out of host2 and waiting for the prompt on host1. Ah, but it does work after detaching from screen. Thanks. That's helpful, but I need to know in advance, when the ssh connection from host1 to host2 isn't going to terminate cleanly. Then I can use the escape sequence. If I can't tell in advance, and I try to log out, it doesn't help. But thanks for the suggestion.> > If I don't do that, the shell prompt doesn't come back > > until the long-running command on the third host has > > completed. > > Correct. That is the correct behavior. The long running command on > the remove is holding the file open. Non-zero reference count. When > the process exits then it closes the file. Which closes the network > connection. Which allows ssh to exit.I think that if the ssh -a from host2 to host3 closed its connection to the agent once it had finished with it, then so would the sshd process on host2 (maybe), and that would allow ssh from host1 to host2 to exit normally. I'm only theorizing about that. I haven't looked at the code yet.> > To see what I mean: > > > > - on host1: Have ssh-agent running with an identity loaded > > - on host1: "xterm &" (start an xterm on similar) > > All good. > > > - on host1 in xterm: "ssh -A host2" (ssh-with-agent-forwarding to host2) > > At this point my personal opinion is that we should pause and think > about why -A might be wanted here. I realize the option exists. IIt is wanted there because I need to ssh from host2 to host3 and that connection needs to be authenticated and I don't want keys on host2 (and all the other hosts I use).> realize that many people use this option a lot. But personally I > almost never use that option. I don't need to use that option. That > option is just a door that opens to a room filled with a lot of > security layer related questions. Which might be okay. Or might be > a door I don't want to open. > > Why are you using -A there? Do you really need to use it? That would > be a good discussion point. Because I don't ever have a need for it. > Using it means one must trust the remote system not to be malicious. > (Which it mostly will not be. But it is possible.) But mostly > because the live network connection it sets up is then required to > stay available for the full lifecycle. As you have found out. It > creates entanglements. It's messy.Can anyone explain exactly why that connection *is* required to stay available for the full lifecycle. As far as I can tell, it only needs to stay available until the authentication for the ssh connection to host3 is complete. After the authentication, there is no need for it that I am aware of (because the agent is not being forwarded to host3).> > - on host2: "screen" (start a screen session) > > And this sets up a pitfall that might or might not be fallen into. In > the screen environment for every shell started within it will be the > environment variables from the ssh connection. You will probably see > something like this example. > > rwp at madness:~$ env | grep SSH > SSH_AUTH_SOCK=/tmp/ssh-YsdgP0Eexk/agent.14641 > SSH_CONNECTION=192.168.230.119 44148 192.168.230.123 22 > SSH_CLIENT=192.168.230.119 44148 22 > SSH_TTY=/dev/pts/4 > > The problem is the SSH_AUTH_SOCK which is setting up the connectivity > to the ssh-agent on your originating client. If you avoid that then > you avoid the problem. I daresay the easiest way to avoid it is to > avoid the -A option. But if you must use it then when setting up > screen you can remove it from the environment. > > env -u SSH_AUTH_SOCK screenNo. That doesn't solve the problem. It prevents the ssh to host3 from having access to the agent which it needs initially, so as to be able to authenticate the connection. The subsequent use of ssh -a means that the access to the agent will not be needed after that point in time (as far as I can tell).> Here is am using 'env' to unset the variable from the environment. > And also 'env' is an idiom for a canonical way to set or clear > environment variables regardless of the command line shell that anyone > might be using. Because bash, ksh, zsh, csh, and all of those have > slightly different syntax. But invoking 'env' this way would be > identical in all of them. Which makes it easiest for me to suggest > using env in this way and knowing that it will work regardless of the > user shell environment. Isn't that pretty nice? :-) > > > - on host2 in screen: "ssh -a host3 sleep 60" (long-running cmd on host3) > > And here you are using -a to *avoid* the ssh-agent that was set up > with the -A in the originating invocation. Layers and layers! If the > originating -A was removed then this -a could be removed. Simplify!Except that the originating -A is needed to be able to authenticate the connection from host2 to host3.> > - on host2 in screen: Ctrl-a d (detach from the screen session) > > But it really can't! Because of the live long running network > connection to the ssh-agent. "The cake is a lie!"Yes it can. Detaching from screen works fine. It's only the logging out of host2 afterwards that is a problem.> > - on host2: Ctrl-d (log out of host2) > > This is not quite a "log out of host2". This is an "exit the command > line shell running on host2". The difference is important in this > specific case. Because the command line shell will exit. But the > command line shell is running under sshd on the remote host. And that > is talking to the ssh on the local host. And as described the remote > sshd is going to need to keep running waiting for the live network > connection to your ssh-agent to close.Yes, I know. But what I don't know is why that sshd process on host2 still has the additional connection to the agent socket that it created when the ssh to host3 started. I can't help thinking that if the ssh to host3 closed its connection when it no longer needed it, the sshd process would probably also close its corresponding connection (just guessing), and then there would no longer be any ssh channel that needed to stay open and prevent me from logging out of host2.> > - on host1: wait a long time for the shell prompt to appear or close xterm > > Right. As it should be doing. Due to the use of -A.No. If I use ssh -A to log into host2 and then log out again, this problem does not exist. The problem is not the -A option. The problem is only when an additional connection to the agent is created and not closed when it is no longer needed.> > In other words, I want the agent to be forwarded to > > host2, so that I can then ssh from there to host3, but > > I don't want the agent to be forwarded to host3 because > > it's not needed there. Note that my real command was > > rsync so both host2 and host3 were involved. > > That is the critical point. I've written a bunch here already. With > a lot of opinions embedded! :-) So I will more or less pause for a > moment here for reflection. Because everything becomes interconnected > and understanding those interconnections will make working with things > all make sense. "Know how the system works. Know how to work the > system." :-) > > > My hypothesis is that agent forwarding has something to > > do with why the connection between host1 and host2 > > isn't cleanly closed. > > And I believe your hypothesis to be a correct one. > > > Any suggestions? > > [...keychain snip...]> Hope this helps! :-)It doesn't help much but many thanks for trying.> Bobcheers, raf
Hi, raf wrote:> If I ssh-with-agent-forwarding from one host to a > second host, and on the second host use something like > nohup/screen/tmux/daemon, and from within that new > process session, start a long-running command via > ssh-without-agent-forwarding on a third host, I would > expect to be able to (e.g.) detach from the screen > session and log out of the second host,I do this every now and then and for me it works exactly as expected. I don't see the behavior you describe.> host1 ssh: OpenSSH_8.1p1, OpenSSL 1.1.1g 21 Apr 2020 > host2 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019 > host3 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019I never use debian, only systems with upstream OpenSSH-portable. So I'd suggest to try to reproduce with a host2 (and possibly also host3) that runs vanilla OpenSSH-portable, without distribution patches. You could even use host1 as host2 and host3 for testing. Kind regards //Peter
Peter Stuge wrote:> Hi, > > raf wrote: > > If I ssh-with-agent-forwarding from one host to a > > second host, and on the second host use something like > > nohup/screen/tmux/daemon, and from within that new > > process session, start a long-running command via > > ssh-without-agent-forwarding on a third host, I would > > expect to be able to (e.g.) detach from the screen > > session and log out of the second host, > > I do this every now and then and for me it works exactly as expected. > > I don't see the behavior you describe. > > > host1 ssh: OpenSSH_8.1p1, OpenSSL 1.1.1g 21 Apr 2020 > > host2 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019 > > host3 ssh: OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u 20 Dec 2019 > > I never use debian, only systems with upstream OpenSSH-portable. > > So I'd suggest to try to reproduce with a host2 (and possibly also host3) > that runs vanilla OpenSSH-portable, without distribution patches. > > You could even use host1 as host2 and host3 for testing. > > Kind regards > > //PeterHi Peter, Thanks for that. By using openssh-portable as sshd on host2, the problem disappears completely. Unsurprisingly, lsof shows sshd having no second connection to the agent socket after starting the second ssh from host2 to host3. So it looks like something that the debian folks have changed. I'll try to find out what, or maybe just report it as a bug in the debian package and hope they fix it. Thanks again. cheers, raf