Hi,
Summary of my problem:
Remote X forwarding is apperently randomly impossible for different display
numbers.
At the end of this mail you will find a recipe for how to reproduce
this behaviour easily.
I use SuSE 10.2 with the following openssh version:
OpenSSH_4.4p1, OpenSSL 0.9.8d 28 Sep 2006
Clients (Linux and Windows (Cygwin)) connect to the server with
X-Forwarding enabled ("-X" or "-Y").
The ssh server gives away local ports above 6010 for the X connections
of these clients. (default setup)
This setup works very stable (for years), BUT sometimes (every few
weeks) I receive "can't connect" errors, after opening a ssh
connection
(successfully) and trying to run a remote X.program (e.g. xev).
For example: after connecting to the server (ssh -X ...), the DISPLAY
environment setting is "localhost:18". See the following output:
<snip>
jackdaw:~ # netstat -lpn| grep 60
tcp 0 0 127.0.0.1:6016 0.0.0.0:* LISTEN
24607/sshd: jens at no
tcp 0 0 127.0.0.1:6017 0.0.0.0:* LISTEN
25900/sshd: michael
tcp 0 0 127.0.0.1:6019 0.0.0.0:* LISTEN
18030/sshd: lars at no
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN
519/sshd: steffen at n
tcp 0 0 127.0.0.1:6011 0.0.0.0:* LISTEN
12190/sshd: ansgar@
tcp 0 0 127.0.0.1:6012 0.0.0.0:* LISTEN
25795/sshd: norbert
tcp 0 0 127.0.0.1:6013 0.0.0.0:* LISTEN
13587/sshd: henning
tcp 0 0 127.0.0.1:6014 0.0.0.0:* LISTEN
14594/sshd: diana at n
tcp 0 0 127.0.0.1:6015 0.0.0.0:* LISTEN
15447/sshd: axel at no
tcp 0 0 ::1:6016 :::* LISTEN
24607/sshd: jens at no
tcp 0 0 ::1:6017 :::* LISTEN
25900/sshd: michael
tcp 0 0 ::1:6018 :::* LISTEN
26589/sshd: lars at no
tcp 0 0 ::1:6019 :::* LISTEN
18030/sshd: lars at no
tcp 0 0 ::1:6010 :::* LISTEN
519/sshd: steffen at n
tcp 0 0 ::1:6011 :::* LISTEN
12190/sshd: ansgar@
tcp 0 0 ::1:6012 :::* LISTEN
25795/sshd: norbert
tcp 0 0 ::1:6013 :::* LISTEN
13587/sshd: henning
tcp 0 0 ::1:6014 :::* LISTEN
14594/sshd: diana at n
tcp 0 0 ::1:6015 :::* LISTEN
15447/sshd: axel at no
</snip>
Out of some reason, port 6018 on 127.0.0.1 is not used by sshd (but it should:
see "::1:6018" below).
Further investigations lead to the following:
<snip>
jackdaw:~ # netstat -pn | grep ":6016"
tcp 0 0 127.0.0.1:6016 127.0.0.1:6039 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6038 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6037 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6047 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:24990 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6045 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6044 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6040 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6023 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6022 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6018 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6017 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6017 127.0.0.1:6016 ESTABLISHED
14353/kdeinit Runni
tcp 0 0 127.0.0.1:6018 127.0.0.1:6016 ESTABLISHED
14360/kded [kdeinit
tcp 0 0 127.0.0.1:6022 127.0.0.1:6016 ESTABLISHED
14367/ksmserver [kd
tcp 0 0 127.0.0.1:6023 127.0.0.1:6016 ESTABLISHED
14368/kwin [kdeinit
tcp 0 0 127.0.0.1:6024 127.0.0.1:6016 ESTABLISHED
14370/kdesktop [kde
tcp 0 0 127.0.0.1:6025 127.0.0.1:6016 ESTABLISHED
14372/kicker [kdein
tcp 0 0 127.0.0.1:6037 127.0.0.1:6016 ESTABLISHED
14380/amarokapp
tcp 0 0 127.0.0.1:6038 127.0.0.1:6016 ESTABLISHED
14382/kerry [kdeini
tcp 0 0 127.0.0.1:6039 127.0.0.1:6016 ESTABLISHED
14360/kded [kdeinit
tcp 0 0 127.0.0.1:6040 127.0.0.1:6016 ESTABLISHED
14358/klauncher [kd
tcp 0 0 127.0.0.1:6044 127.0.0.1:6016 ESTABLISHED
14392/knotify [kdei
tcp 0 0 127.0.0.1:6045 127.0.0.1:6016 ESTABLISHED
14396/konqueror [kd
tcp 0 0 127.0.0.1:6047 127.0.0.1:6016 ESTABLISHED
14407/klipper [kdei
tcp 0 0 127.0.0.1:6058 127.0.0.1:6016 ESTABLISHED
14436/beagled
tcp 0 0 127.0.0.1:6068 127.0.0.1:6016 ESTABLISHED
14450/firefox-bin
tcp 0 0 127.0.0.1:6016 127.0.0.1:6025 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6024 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6068 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:6058 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:17119 127.0.0.1:6016 ESTABLISHED
14487/beagled-helpe
tcp 0 0 127.0.0.1:17103 127.0.0.1:6016 ESTABLISHED
14450/firefox-bin
tcp 0 0 127.0.0.1:6016 127.0.0.1:17119 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:6016 127.0.0.1:17103 ESTABLISHED
14279/sshd: jens at no
tcp 0 0 127.0.0.1:24990 127.0.0.1:6016 ESTABLISHED
16717/sunbird-bin
</snip>
It seems like the user (in this example: "jens") was connected to
localhost:16 via "ssh -X ...". This X forwarding opened a port for
every X program within the session,
Now comes the problem:
The ports that were used, are within the range of the ports that are
used for new X forwarding connections as well. This leads to problems
for users trying to connect to the server, later.
After the user on port 6016 disconnected and reconnected again, the
problem was gone - his programs used a different (random?) port range
for connections. There was no problem to create new sessions, anymore.
Maybe the real root of the problem is, that the ssh server does not
check, if a port is already in use, when it creates the DISPLAY setting
for a new connection.
In this case, it should have noticed, that the ports 6017 and 6018 are
already in use and should announce a "localhost:19" DISPLAY setting to
the next new X forwarding session (skipping the unusable
"localhost:17"
and "..:18").
How to reproduce the problem (using netcat):
1) setup an ssh server with X forwarding enabled
2) check open X forwarded sessions with "netstat -lpn | grep
':60'"
3) run "netcat -l -p 6010" (use the lowest free port number greater or
equal to 6010) - this blocks the specific port
4) connect to the server and run a X program, e.g.: "ssh -X $HOST
xeyes"
Result: sshd cannot use the (blocked) port - so the client cannot run X
programs.
Is there a possible workaround how to tell the server, that it may not
forward local X connections to ports that are within a specific range
(in this case maybe 6000-6100)?
Maybe it would be good, only to use dynamic port numbers for new
processes that are far away from the port range needed by the ssh
daemon for new connections?
Or are there any other solutions?
thanks for your hard work,
Lars
Hi to all of you, maybe my previous mail (http://permalink.gmane.org/gmane.network.openssh.devel/13345) was not clear enough, so I will try to summarize it more concisely: If I use X-Frowarding, then the ssh-daemon offers DISPLAY settings, that can not be used. Thus resulting in "cannot connect ..." errors. From my point of view, the ssh-daemon should check, if (for example) port 6014 is available before it offers the DISPLAY "localhost:4". This not-checking is especially ugly, as the ssh-daemon itself occupied the respective port during another X-Forwarding session. Result: for now there is no way for me to use X-Forwarding safely. The only thing, I can do, is to regularly check (by cron), if there is an X-Forwarding session, that occupies crucial ports (between 6000 and 6100). If this happens, then I have to ask the user to log out and start his session again. Otherwise all the other users would complain, that they cannot connect to the X-Server. How could I avoid this ugly situation? Maybe I just do not really get the point? regards, Lars
On Mon, Feb 05, 2007 at 12:47:11PM +0100, Lars Kruse wrote:> Hi to all of you, > > maybe my previous mail > (http://permalink.gmane.org/gmane.network.openssh.devel/13345) was not > clear enough, so I will try to summarize it more concisely:I missed the original post but just went and reviewed it.> If I use X-Frowarding, then the ssh-daemon offers DISPLAY settings, > that can not be used. Thus resulting in "cannot connect ..." errors. > > >From my point of view, the ssh-daemon should check, if (for example) > port 6014 is available before it offers the DISPLAY "localhost:4". > > This not-checking is especially ugly, as the ssh-daemon itself occupied > the respective port during another X-Forwarding session.It does check that it can bind to the port, though (see x11_create_display_inet()). I suspect the root of your problem is some funkiness with IPv6. Note that some of the listening sockets in your original post are listening on ::1 and some on 127.0.0.1. Do you have X11UseLocalhost set in sshd_config? If so, what does "localhost" resolve to? If you can afford to do so you could try running without the ipv6 stack loaded. -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Hi,> Do you have X11UseLocalhost set in sshd_config? If so, what does > "localhost" resolve to?the setting "X11UseLocalhost" is not defined in our sshd_config - so it should be the default value "yes". jackdaw:~ # grep localhost /etc/hosts 127.0.0.1 localhost ::1 ip6-localhost ip6-loopback So "localhost" should resolve to the ipv4 address.> If you can afford to do so you could try running without the ipv6 > stack loaded.good idea! I tried it ("AddressFamily inet") successfully: now busy ports are skipped (as expected). Maybe the ipv4 port should be checked in x11_create_display_inet, too? (if "AddressFamily" is "any") As I do not speak "C" fluently, I am unable to suggest something - sorry ... thanks a lot for your suggestion! regards, Lars
Hi to all of you,
I would like to summarize the current state of the problem as described
in http://permalink.gmane.org/gmane.network.openssh.devel/13345.
If the openssh server is running in ipv4/ipv6 mode ("AddressFamily
Any"), then pseudo-random "unable-to-connect-to-display" errors
occour
for clients connecting via ssh for X-forwarded remote sessions.
For now the only workaround would be, to disable ipv6 support for
openssh daemons used for X-forwarding.
From my point of view, there are two ways to solve the root of this
problem:
1) improved "is this port usable on all interfaces?"-detection
ipv4/ipv6 mixed openssh daemons should behave like pure ipv4 daemons:
unusable DISPLAY settings may never be offered to clients
2) avoid to randomly allocate critical ports
the openssh daemon may never allocate ports for running X-sessions which
are in the range, that is used for new X-forwarding connections (maybe
6000..6100).
From my point of view, this issue is a highly irritating one, as it is
very hard to track down the source of this seemingly random
"unable-to-connect-to-display" problem. If the previously described
short-term-workaround would not be available, then our current
X-session-setup would have to be replaced by a more reliable, but less
preferable solution.
So I am very glad, that you helped me to find this workaround ...
But how can this issue be solved without loosing ipv6 compatibility?
thanks and regards,
Lars