Hi,
Summary of my problem:
Remote X forwarding is apperently randomly impossible for different display
numbers.
At the end of this mail you will find a recipe for how to reproduce
this behaviour easily.
I use SuSE 10.2 with the following openssh version:
OpenSSH_4.4p1, OpenSSL 0.9.8d 28 Sep 2006
Clients (Linux and Windows (Cygwin)) connect to the server with
X-Forwarding enabled ("-X" or "-Y").
The ssh server gives away local ports above 6010 for the X connections
of these clients. (default setup)
This setup works very stable (for years), BUT sometimes (every few
weeks) I receive "can't connect" errors, after opening a ssh
connection
(successfully) and trying to run a remote X.program (e.g. xev).
For example: after connecting to the server (ssh -X ...), the DISPLAY
environment setting is "localhost:18". See the following output:
<snip>
jackdaw:~ # netstat -lpn| grep 60
tcp 0 0 127.0.0.1:6016 0.0.0.0:* LISTEN
24607/sshd: jens at no
tcp 0 0 127.0.0.1:6017 0.0.0.0:* LISTEN
25900/sshd: michael
tcp 0 0 127.0.0.1:6019 0.0.0.0:* LISTEN
18030/sshd: lars at no
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN
519/sshd: steffen at n
tcp 0 0 127.0.0.1:6011 0.0.0.0:* LISTEN
12190/sshd: ansgar@
tcp 0 0 127.0.0.1:6012 0.0.0.0:* LISTEN
25795/sshd: norbert
tcp 0 0 127.0.0.1:6013 0.0.0.0:* LISTEN
13587/sshd: henning
tcp 0 0 127.0.0.1:6014 0.0.0.0:* LISTEN
14594/sshd: diana at n
tcp 0 0 127.0.0.1:6015 0.0.0.0:* LISTEN
15447/sshd: axel at no
tcp 0 0 ::1:6016 :::* LISTEN
24607/sshd: jens at no
tcp 0 0 ::1:6017 :::* LISTEN
25900/sshd: michael
tcp 0 0 ::1:6018 :::* LISTEN
26589/sshd: lars at no
tcp 0 0 ::1:6019 :::* LISTEN
18030/sshd: lars at no
tcp 0 0 ::1:6010 :::* LISTEN
519/sshd: steffen at n
tcp 0 0 ::1:6011 :::* LISTEN
12190/sshd: ansgar@
tcp 0 0 ::1:6012 :::* LISTEN
25795/sshd: norbert
tcp 0 0 ::1:6013 :::* LISTEN
13587/sshd: henning
tcp 0 0 ::1:6014 :::* LISTEN
14594/sshd: diana at n
tcp 0 0 ::1:6015 :::* LISTEN
15447/sshd: axel at no
</snip>
Out of some reason, port 6018 on 127.0.0.1 is not used by sshd (but it should:
see "::1:6018" below).
Further investigations lead to the following:
<snip>
jackdaw:~ # netstat -pn | grep ":6016"
tcp 0 0 127.0.0.1:6016 127.0.0.1:6039 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6038 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6037 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6047 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:24990 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6045 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6044 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6040 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6023 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6022 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6018 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6017 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6017 127.0.0.1:6016 ESTABLISHED
14353/kdeinit Runni
tcp 0 0 127.0.0.1:6018 127.0.0.1:6016 ESTABLISHED
14360/kded [kdeinit
tcp 0 0 127.0.0.1:6022 127.0.0.1:6016 ESTABLISHED
14367/ksmserver [kd
tcp 0 0 127.0.0.1:6023 127.0.0.1:6016 ESTABLISHED
14368/kwin [kdeinit
tcp 0 0 127.0.0.1:6024 127.0.0.1:6016 ESTABLISHED
14370/kdesktop [kde
tcp 0 0 127.0.0.1:6025 127.0.0.1:6016 ESTABLISHED
14372/kicker [kdein
tcp 0 0 127.0.0.1:6037 127.0.0.1:6016 ESTABLISHED
14380/amarokapp
tcp 0 0 127.0.0.1:6038 127.0.0.1:6016 ESTABLISHED
14382/kerry [kdeini
tcp 0 0 127.0.0.1:6039 127.0.0.1:6016 ESTABLISHED
14360/kded [kdeinit
tcp 0 0 127.0.0.1:6040 127.0.0.1:6016 ESTABLISHED
14358/klauncher [kd
tcp 0 0 127.0.0.1:6044 127.0.0.1:6016 ESTABLISHED
14392/knotify [kdei
tcp 0 0 127.0.0.1:6045 127.0.0.1:6016 ESTABLISHED
14396/konqueror [kd
tcp 0 0 127.0.0.1:6047 127.0.0.1:6016 ESTABLISHED
14407/klipper [kdei
tcp 0 0 127.0.0.1:6058 127.0.0.1:6016 ESTABLISHED
14436/beagled
tcp 0 0 127.0.0.1:6068 127.0.0.1:6016 ESTABLISHED
14450/firefox-bin
tcp 0 0 127.0.0.1:6016 127.0.0.1:6025 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6024 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6068 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6058 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:17119 127.0.0.1:6016 ESTABLISHED
14487/beagled-helpe
tcp 0 0 127.0.0.1:17103 127.0.0.1:6016 ESTABLISHED
14450/firefox-bin
tcp 0 0 127.0.0.1:6016 127.0.0.1:17119 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:17103 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:24990 127.0.0.1:6016 ESTABLISHED
16717/sunbird-bin
</snip>
It seems like the user (in this example: "jens") was connected to
localhost:16 via "ssh -X ...". This X forwarding opened a port for
every X program within the session,
Now comes the problem:
The ports that were used, are within the range of the ports that are
used for new X forwarding connections as well. This leads to problems
for users trying to connect to the server, later.
After the user on port 6016 disconnected and reconnected again, the
problem was gone - his programs used a different (random?) port range
for connections. There was no problem to create new sessions, anymore.
Maybe the real root of the problem is, that the ssh server does not
check, if a port is already in use, when it creates the DISPLAY setting
for a new connection.
In this case, it should have noticed, that the ports 6017 and 6018 are
already in use and should announce a "localhost:19" DISPLAY setting to
the next new X forwarding session (skipping the unusable
"localhost:17"
and "..:18").
How to reproduce the problem (using netcat):
1) setup an ssh server with X forwarding enabled
2) check open X forwarded sessions with "netstat -lpn | grep
':60'"
3) run "netcat -l -p 6010" (use the lowest free port number greater or
equal to 6010) - this blocks the specific port
4) connect to the server and run a X program, e.g.: "ssh -X $HOST
xeyes"
Result: sshd cannot use the (blocked) port - so the client cannot run X
programs.
Is there a possible workaround how to tell the server, that it may not
forward local X connections to ports that are within a specific range
(in this case maybe 6000-6100)?
Maybe it would be good, only to use dynamic port numbers for new
processes that are far away from the port range needed by the ssh
daemon for new connections?
Or are there any other solutions?
thanks for your hard work,
Lars