bugzilla-daemon at bugzilla.mindrot.org
2016-Apr-18 02:49 UTC
[Bug 2565] New: High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Bug ID: 2565
Summary: High baud rate gets sent, solaris closes pty
Product: Portable OpenSSH
Version: 7.1p2
Hardware: Sparc
OS: Solaris
Status: NEW
Severity: minor
Priority: P5
Component: sshd
Assignee: unassigned-bugs at mindrot.org
Reporter: thogard at abnormal.com
It seems that trying to set the tty baud rate in Solaris to reasonable
modern values is broken resulting in the pseudo tty output being
closed. This effects Solaris 11.3 back to at least Sol 9.
I can replicate it from OS X:
stty ospeed 57600
ssh solaris
Password: ....
(no output)
I can force it 9600 to fix the problem with code that verifies the
problem and fixes it:
case TTY_OP_ISPEED_PROTO2 (and TTY_OP_OSPEED_PROTO2)
...
baud = packet_get_int();
+debug("ibaud=%d",baud);
+baud=9600;
On some clients (such as QNX) stty can be used as a work around. "stty
ispeed 9600;stty ospeed 9600;ssh broken-host" will work.
Is it appropriate to try to fix this in sshd? If so, then should the
proper way to fix this be something along the line of if ( baud >
MAX_BAUD) baud=MAX_BAUD;?
I can't see a way of setting MAX_BAUD other than an config option hack
with a list for broken systems. Even very old Solaris has B57600 in
termios.h so that can't be used as an indicator.
--
You are receiving this mail because:
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Apr-18 10:59 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |djm at mindrot.org
--- Comment #1 from Damien Miller <djm at mindrot.org> ---
does setting ospeed actually do anything on a PTY? AFAIK it doesn't on
OS X, BSD or Linux.
If the terminal is being closed as a result of a failed cfsetispeed()
call then that indicates a kernel or (possibly) libc problem - ssh
doesn't exit when that fails, it just logs an error and continues.
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-20 03:48 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Darren Tucker <dtucker at zip.com.au> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dtucker at zip.com.au
--- Comment #2 from Darren Tucker <dtucker at zip.com.au> ---
(In reply to Damien Miller from comment #1)> does setting ospeed actually do anything on a PTY? AFAIK it doesn't
> on OS X, BSD or Linux.
>
> If the terminal is being closed as a result of a failed
> cfsetispeed() call then that indicates a kernel or (possibly) libc
> problem - ssh doesn't exit when that fails, it just logs an error
> and continues.
I can reproduce this with an OpenBSD client and Solaris 11 server. The
cfsetispeed succeeds but it looks like the data from the pty slave
never makes it back to the master.
truss'ing (with -v pollsys) sshd then hitting enter ends with
(lwp_sigmask calls elided):
21585: write(2, " [ d t u c k e r @ s o l".., 19) = 19
21583: pollsys(0x08047890, 2, 0x00000000, 0x00000000) (sleeping...)
21583: fd=4 ev=POLLRDNORM rev=0
21583: fd=6 ev=POLLRDNORM rev=0x814
21583: pollsys(0x08047000, 2, 0x00000000, 0x00000000) (sleeping...)
21585: read(0, 0x080473AC, 1) (sleeping...)
The pids are:
dtucker 21585 21583 0 11:40:29 pts/2 0:00 -bash
root 21580 21579 0 11:40:27 pts/1 0:00 sshd -d -p 2022 -R
dtucker 21583 21580 0 11:40:29 pts/1 0:00 sshd -d -p 2022 -R
so we've got the shell writing the prompt to stderr then waiting for
input on stdin, which are the pty. Looks reasonable.
The sshd post-auth privsep slave is blocked in poll (which is how
select is implemented on Solaris).
sshd's descriptors are:
$ sudo lsof -p 21583
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE
NAME
[...]
sshd 21583 root 0u VCHR 221,1 443220255
/dev/pts/1
sshd 21583 root 1u VCHR 221,1 443220255
/dev/pts/1
sshd 21583 root 2u VCHR 221,1 443220255
/dev/pts/1
sshd 21583 root 3r DOOR 0t0 46
/system/volatile/name_service_door (door to nscd[426])
(FA:->0xffffff00cf178700)
sshd 21583 root 4u FIFO 0xffffff00dadd16c0 0t0 184434
(fifofs) PIPE->0xffffff00dadd1750
sshd 21583 root 5u FIFO 0xffffff00dadd1750 0t0 184434
(fifofs) PIPE->0xffffff00dadd16c0
sshd 21583 root 6u IPv4 0xffffff00ce59a100 0t5578 TCP
sol11.dtucker.net:2022->client.dtucker.net:31563 (ESTABLISHED)
sshd 21583 root 7u VCHR 238,2 81265120
/devices/pseudo/clone at 0:ptm->ptm
sshd 21583 root 10u VCHR 238,2 81265120
/devices/pseudo/clone at 0:ptm->ptm
So, sshd is waiting to read from the TCP connection or a pipe, but not
the pty master. That seems odd.
For the record, sshd is blocking is in wait_until_can_do_something():
$ sudo gdb -q --args `pwd`/sshd -de -p 2022 -o
useprivilegeseparation=no -r
(gdb) set follow-fork child
(gdb) run
[...]
debug1: Setting controlling tty using TIOCSCTTY.
^C
Program received signal SIGINT, Interrupt.
0xfec78ec5 in __pollsys () from /lib/libc.so.1
(gdb) bt
#0 0xfec78ec5 in __pollsys () from /lib/libc.so.1
#1 0xfec664c6 in _pollsys () from /lib/libc.so.1
#2 0xfec233b8 in pselect () from /lib/libc.so.1
#3 0xfec236b9 in select () from /lib/libc.so.1
#4 0x08074bbb in wait_until_can_do_something (readsetp=0x80479f8,
writesetp=0x80479f4, maxfdp=0x80479f0, nallocp=0x80479ec,
max_time_ms=0)
at ../../serverloop.c:370
#5 0x08075ce7 in server_loop2 (authctxt=0x814be78) at
../../serverloop.c:863
#6 0x08081a3b in do_authenticated2 (authctxt=0x814be78)
at ../../session.c:2758
#7 0x0807be6c in do_authenticated (authctxt=0x814be78) at
../../session.c:271
#8 0x0806be26 in main (ac=7, av=0x81451a8) at ../../sshd.c:2324
--
You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-20 03:56 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565 --- Comment #3 from Darren Tucker <dtucker at zip.com.au> --- Comparing the truss from a client with ospeed=9600: 5172: pollsys(0x08047890, 3, 0x00000000, 0x00000000) (sleeping...) 5172: fd=4 ev=POLLRDNORM rev=0 5172: fd=6 ev=POLLRDNORM rev=0x814 5172: fd=9 ev=POLLRDNORM rev=0x804 5174: read(0, 0x080473AC, 1) (sleeping...) $ sudo lsof -p 5172 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [...] sshd 5172 root 4u FIFO 0xffffff00dadd16c0 0t0 184472 (fifofs) PIPE->0xffffff00dadd1750 sshd 5172 root 5u FIFO 0xffffff00dadd1750 0t0 184472 (fifofs) PIPE->0xffffff00dadd16c0 sshd 5172 root 6u IPv4 0xffffff00de346380 0t6014 TCP sol11.dtucker.net:2022->quoll.dtucker.net:30321 (ESTABLISHED) sshd 5172 root 7u VCHR 238,2 81265120 /devices/pseudo/clone at 0:ptm->ptm sshd 5172 root 9u VCHR 238,2 81265120 /devices/pseudo/clone at 0:ptm->ptm sshd 5172 root 10u VCHR 238,2 81265120 /devices/pseudo/clone at 0:ptm->ptm so sshd is polling an extra descriptor attached to the pty master. I don't know why yet though. -- You are receiving this mail because: You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Jul-20 04:41 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
--- Comment #4 from Darren Tucker <dtucker at zip.com.au> ---
Confirmed that commenting out only the calls to cfsetospeed and
cfsetispeed in ttymodes.c prevents the problem.
Going back through a complete strace of sshd I see this:
6212: pollsys(0x08047890, 3, 0x00000000, 0x00000000) = 2
6212: fd=4 ev=POLLRDNORM rev=0
6212: fd=6 ev=POLLOUT|POLLRDNORM rev=POLLOUT
6212: fd=9 ev=POLLRDNORM rev=POLLRDNORM
6212: clock_gettime(4, 0x08047954) = 0
6212: read(9, 0x0804391C, 16384) = 0
6212: close(9) = 0
6212: write(6, "\0\0\010 BA0 q zF8 {88F9".., 84) = 84
6212: getpid() = 6212 [6209]
6212: clock_gettime(4, 0x080478B4) = 0
6212: pollsys(0x08047890, 2, 0x00000000, 0x00000000) = 1
6212: fd=4 ev=POLLRDNORM rev=0
6212: fd=6 ev=POLLOUT|POLLRDNORM rev=POLLOUT
where fd#9 is our missing FD on the pty master, and this is the first
pollsys where fd#9 is included.
>From this we can infer from this that sshd does in fact set up the
descriptors correctly. pollsys says that fd#9 is readable, and when
sshd reads it the read returns zero, indicating end-of-file. sshd then
closes the descriptor and removes it from the set it's looking at,
hence why we didn't see it in the later calls.
If this sounds familiar it's because we've encountered the same thing
on AIX (but probably for different reasons, in that case it was passing
through zero-byte writes from the pty slave). There's already a
workaround hack for it (PTY_ZEROREAD) which seems to help in this case,
but I'd like to understand what the problem is (and what other
potential side effects are) before we go enabling that on Solaris.
Anyone from Oracle care to comment? Why does setting the baud rate on
a pty to 57600 cause it to change its read behaviour?
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Aug-02 14:46 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Tomas Kuthan <tomas.kuthan at oracle.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tomas.kuthan at oracle.com
Assignee|unassigned-bugs at mindrot.org |tomas.kuthan at oracle.com
--- Comment #5 from Tomas Kuthan <tomas.kuthan at oracle.com> ---
Thanks for reporting this issue.
It turns out, it is a long-standing Solaris kernel bug. It only
pertains to the one particular speed (57600 baud), other rates (such as
115200, 230400 and 460800) work fine, provided that the HW device
supports it.
I have a verified fix ready for this. We will fix it eventually in
Solaris.
I am taking ownership of the bug for now, but we could just as well
close it, as this is not a bug in OpenSSH.
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Aug-02 23:30 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565 --- Comment #6 from Darren Tucker <dtucker at zip.com.au> --- (In reply to Tomas Kuthan from comment #5)> Thanks for reporting this issue. > > It turns out, it is a long-standing Solaris kernel bug. It only > pertains to the one particular speed (57600 baud), other rates (such > as 115200, 230400 and 460800) work fine, provided that the HW device > supports it.That sounds fascinatingly specific. Are you able to share the details?> I have a verified fix ready for this. We will fix it eventually in > Solaris.Thanks!> I am taking ownership of the bug for now, but we could just as well > close it, as this is not a bug in OpenSSH.Given that there's already a workaround (ie "don't do that then") and a proper fix coming I think we should close the bug. -- You are receiving this mail because: You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2016-Aug-03 08:56 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Tomas Kuthan <tomas.kuthan at oracle.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |MOVED
--- Comment #7 from Tomas Kuthan <tomas.kuthan at oracle.com> ---
On Solaris, 16 original baud rates B0 through B38400 were represented
in the four least significant bits of termios.c_cflag. When additional
baud rates were added back in 1994, the adjacent bits were already
assigned for a different purpose, so bits 22 and 23 were used to flag
extended baudrates. Baud rates B57600 through B460800 were again
represented in the least significant bits with the extension bit set.
All the speed handling code was updated to take the extension bits into
account.
... all the code, but this one if, that was left behind. B57600 is
encoded with the extension bit set and with the 4 least significant
bits all zeros. The neglected if has mistaken B57600 for B0. B0 means
'hang up'; hung up it did.
I am closing the bug as not-an-OpenSSH-bug. (Feel free to change the
substatus.)
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
bugzilla-daemon at mindrot.org
2021-Apr-23 05:00 UTC
[Bug 2565] High baud rate gets sent, solaris closes pty
https://bugzilla.mindrot.org/show_bug.cgi?id=2565
Damien Miller <djm at mindrot.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED
--- Comment #8 from Damien Miller <djm at mindrot.org> ---
closing resolved bugs as of 8.6p1 release
--
You are receiving this mail because:
You are watching someone on the CC list of the bug.
Apparently Analagous Threads
- Two APC900 UPS on the same usbbus1
- [Bug 2636] New: Fix X11 forwarding, when ::1 is not configured
- [Bug 2376] New: Add compile time option to disable Curve25519
- [Bug 2299] New: Disable uid=0 resetting test on Solaris
- [Bug 2719] New: Notify user, when ssh transport process dies.