Hong Cho
2019-Aug-20 02:56 UTC
[8.0p1] After the remote ssh command execution, the session does not terminate until an active disconnect
Hi, We recently upgraded OpenSSH from 7.5p1 to 8.0p1, and received a report that there was a behavioral differences where the ssh session did not disconnect after a remote background command execution. For example, "ssh user at example.com 'tail -f /var/xxx &' After looking at the debug logs and comparing the code, it seems there was a bug in the portable version during the adaptation of the 1.367 change (refactoring) of channels.c. The OpenBSD version looks correct, but for the portable version, the exit condition of channel_handle_efd_read() didn't get converted correctly, but somehow got copy-and-pasted from the rfd version. I couldn't find the change history for the portable version, so I can't when/how this happened. This should make the "force" condition the same as before the refactoring of the portable version of channels.c. --- /tmp/tmp.48415.39 2019-08-19 19:24:29.000000000 -0700 +++ crypto/openssh/channels.c 2019-08-19 19:19:39.000000000 -0700 @@ -2103,7 +2103,15 @@ channel_handle_efd_read(struct ssh *ssh, ssize_t len; int r, force; + force = c->detach_close; - force = c->isatty && c->detach_close && c->istate != CHAN_INPUT_CLOSED; if (c->efd == -1 || (!force && !FD_ISSET(c->efd, readset))) return 1; --- Thanks. Hong.
Damien Miller
2019-Aug-20 06:27 UTC
[8.0p1] After the remote ssh command execution, the session does not terminate until an active disconnect
On Tue, 20 Aug 2019, Hong Cho wrote:> Hi, > > We recently upgraded OpenSSH from 7.5p1 to 8.0p1, and received a report > that there was a behavioral differences where the ssh session did not > disconnect after a remote background command execution. For example, "ssh > user at example.com 'tail -f /var/xxx &' > > After looking at the debug logs and comparing the code, it seems there was > a bug in the portable version during the adaptation of the 1.367 change > (refactoring) of channels.c. > > The OpenBSD version looks correct, but for the portable version, the exit > condition of channel_handle_efd_read() didn't get converted correctly, but > somehow got copy-and-pasted from the rfd version. I couldn't find the > change history for the portable version, so I can't when/how this happened. > > This should make the "force" condition the same as before the refactoring > of the portable version of channels.c. > > --- /tmp/tmp.48415.39 2019-08-19 19:24:29.000000000 -0700 > +++ crypto/openssh/channels.c 2019-08-19 19:19:39.000000000 -0700 > @@ -2103,7 +2103,15 @@ channel_handle_efd_read(struct ssh *ssh, > ssize_t len; > int r, force; > > + force = c->detach_close; > - force = c->isatty && c->detach_close && c->istate != CHAN_INPUT_CLOSED;This change wasn't a refactor or a mistake, but an intentional bugfix for https://bugzilla.mindrot.org/show_bug.cgi?id=2071 Anyway, I don't see any change in behaviour for your testcase between OpenSSH 7.2 and 8.0 on Linux - they both wait for the tail process to terminate before dropping the connection. Applying your suggested change doesn't make any difference either. The behaviour at close has always been a tricky area and there are divergences between portable OpenSSH and OpenBSD for a long time, driven by how more SysV-ish systems like Linux differ from BSDs. BTW line-by-line change history can be extracted using "git blame" on a checkout of the portable OpenSSH git tree. -d
Hong Cho
2019-Aug-20 07:00 UTC
[8.0p1] After the remote ssh command execution, the session does not terminate until an active disconnect
Thank you for the reply. I changed the remote command to a generic example because in our case, we are using an internal binary that I didn't want to make public. I thought it wouldn't matter, but apparently, it does. I am sorry I can't share more about this tool. BTW, our environment is based on FreeBSD, not on Linux. In our case, because "isatty" is set to 0, it never closes. ------ debug2: channel 0: output open -> closed debug2: channel 0: read<=0 rfd 8 len 0 debug2: channel 0: read failed debug2: channel 0: Fssh_chan_shutdown_read (i0 o3 sock -1 wfd 8 efd 10 [read]) debug2: channel 0: input open -> drain debug3: channel_handle_efd_read: channel 0: efd 10 isatty 0 detach_close 1 istate 1 force 0 FD_ISSET 0 debug2: channel 0: ibuf_empty delayed efd 10/(0) debug2: notify_done: reading debug3: channel_handle_efd_read: channel 0: efd 10 isatty 0 detach_close 1 istate 1 force 0 FD_ISSET 0 debug2: channel 0: ibuf_empty delayed efd 10/(0) debug3: channel_handle_efd_read: channel 0: efd 10 isatty 0 detach_close 1 istate 1 force 0 FD_ISSET 0 debug3: receive packet: type 1 Received disconnect from example.com port 29940:11: disconnected by user Disconnected from user user example.com port 29940 ------ The "channel_handle_efd_read" message is added by me, to see what was going on. Before 8.0p1, we were using 7.5p1, and it used to "closing read-efd" after "input open -> drain". However, with 8.0p1, because "isatty" is 0, channel_handle_efd_read() just returns and the channel remains open ("ibuf_empty delayed efd"). Because of this behavioral change, our automatic tests have been blocked. If this is really a bug fix, does it mean that we have to somehow set "isatty" to 1? How can it be done on the command-line from the ssh client? Hong. On Tue, Aug 20, 2019 at 3:27 PM Damien Miller <djm at mindrot.org> wrote:> On Tue, 20 Aug 2019, Hong Cho wrote: > > > Hi, > > > > We recently upgraded OpenSSH from 7.5p1 to 8.0p1, and received a report > > that there was a behavioral differences where the ssh session did not > > disconnect after a remote background command execution. For example, "ssh > > user at example.com 'tail -f /var/xxx &' > > > > After looking at the debug logs and comparing the code, it seems there > was > > a bug in the portable version during the adaptation of the 1.367 change > > (refactoring) of channels.c. > > > > The OpenBSD version looks correct, but for the portable version, the exit > > condition of channel_handle_efd_read() didn't get converted correctly, > but > > somehow got copy-and-pasted from the rfd version. I couldn't find the > > change history for the portable version, so I can't when/how this > happened. > > > > This should make the "force" condition the same as before the refactoring > > of the portable version of channels.c. > > > > --- /tmp/tmp.48415.39 2019-08-19 19:24:29.000000000 -0700 > > +++ crypto/openssh/channels.c 2019-08-19 19:19:39.000000000 -0700 > > @@ -2103,7 +2103,15 @@ channel_handle_efd_read(struct ssh *ssh, > > ssize_t len; > > int r, force; > > > > + force = c->detach_close; > > - force = c->isatty && c->detach_close && c->istate != CHAN_INPUT_CLOSED; > > This change wasn't a refactor or a mistake, but an intentional bugfix for > https://bugzilla.mindrot.org/show_bug.cgi?id=2071 > > Anyway, I don't see any change in behaviour for your testcase between > OpenSSH 7.2 and 8.0 on Linux - they both wait for the tail process to > terminate before dropping the connection. Applying your suggested change > doesn't make any difference either. > > The behaviour at close has always been a tricky area and there are > divergences between portable OpenSSH and OpenBSD for a long time, driven > by how more SysV-ish systems like Linux differ from BSDs. > > BTW line-by-line change history can be extracted using "git blame" > on a checkout of the portable OpenSSH git tree. > > -d >