Goldburt, Dan
2006-Sep-07 19:55 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Hello, ? I need to make many (>50) ssh connections from linux to cygwin at the same time. Using Windows 2000 Server (OpenSSH_4.3p2, OpenSSL 0.9.8b and updated cygwin) and Linux RHEL4 (OpenSSH_3.9p1, OpenSSL 0.9.7a). ? It's been difficult to optimize many simultaneous connections. Here were some issues: 1.?????? On Windows XP/Professional, Microsoft intentionally cripples the TCP/IP stack. Official word (http://support.microsoft.com/kb/Q127144) is that the backlog queue limit on a listen socket is 5 (200 when Server), so you can't accept() more than 5 new connections concurrently. 2.?????? Using a master connection that is shared, the sshd_config variable MaxStartups has no effect. This is because we are not opening lots of ssh connections, but are opening multiple sessions within a single connection. The parameter that needs to be changed is MAX_SESSIONS, which is hardcoded in sessions.c at 10. Request: add "MAX_SESSIONS" as a configuration parameter in sshd_config. (also, you should mention in INSTALL documentation that by default, compiled binaries are quite a bit larger than usual. Do you use strip -strip-debug? ? Finally, I'm able to make many connections most of the time. But then, sshd errors: ? fcntl(223, F_GETFL, 0): Bad file descriptor and sometimes: [sig] bash 3720 _cygtls::handle_threadlist_exception and then loops spitting out "select: Bad file descriptor" and taking up 100% CPU. I have not done a stack trace or increased sshd debug output because the error comes up when about 100 connections are made, so it would be difficult to track down. If this isn't enough information to go on, I will post it. ? Is this because on Cygwin, the "fd_set" arrays, used with select(), can contain file descriptors (FD) from 0 to 63 (the fd_set array is 8-byte long). On Linux, this is 0 to 1023? From: http://www.ipflow.utc.fr/blog/?p=34 and http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=105321853321894&w=3 ? I also found a post http://www.cygwin.com/ml/cygwin/2005-06/msg00511.html where Corinna said "Using a master/slave connection requires the ability to exchange file descriptors over AF_UNIX sockets.? That's not possible in Cygwin." I assume this has been addressed with USE_PIPES, since I AM able to use multiplexed connections most of the time? ? Lastly, if security is not the biggest concern, should I even use ssh? I just need to be able to execute many remote shell commands in a short interval and return the output. ? Thanks, Dan
Goldburt, Dan
2006-Sep-08 15:27 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Here is the sshd log: debug1: server_input_channel_open: ctype session rchan 1 win 131072 max 32768 debug1: input_session_request debug1: channel 1: new [server-session] debug1: session_new: session 1 debug1: session_open: channel 1 debug1: session_open: session 1: link with channel 1 debug1: server_input_channel_open: confirm session debug1: server_input_channel_req: channel 1 request pty-req reply 0 debug1: session_by_channel: session 1 channel 1 debug1: session_input_channel_req: session 1 req pty-req debug1: Allocating pty. debug1: session_pty_req: session 1 alloc /dev/tty6 debug3: tty_parse_modes: SSH2 n_bytes 256 debug3: tty_parse_modes: ospeed 38400 debug3: tty_parse_modes: ispeed 38400 debug3: tty_parse_modes: 1 3 debug3: tty_parse_modes: 2 28 debug3: tty_parse_modes: 3 127 debug3: tty_parse_modes: 4 21 debug3: tty_parse_modes: 5 4 debug3: tty_parse_modes: 6 0 debug3: tty_parse_modes: 7 0 debug3: tty_parse_modes: 8 17 debug3: tty_parse_modes: 9 19 debug3: tty_parse_modes: 10 26 debug3: tty_parse_modes: 12 18 debug3: tty_parse_modes: 13 23 debug3: tty_parse_modes: 14 22 debug3: tty_parse_modes: 18 15 debug3: tty_parse_modes: 30 0 debug3: tty_parse_modes: 31 0 debug3: tty_parse_modes: 32 0 debug3: tty_parse_modes: 33 0 debug3: tty_parse_modes: 34 0 debug3: tty_parse_modes: 35 0 debug3: tty_parse_modes: 36 1 debug3: tty_parse_modes: 37 0 debug3: tty_parse_modes: 38 1 debug3: tty_parse_modes: 39 0 debug3: tty_parse_modes: 40 0 debug3: tty_parse_modes: 41 0 debug3: tty_parse_modes: 50 1 debug3: tty_parse_modes: 51 1 debug1: Ignoring unsupported tty mode opcode 52 (0x34) debug3: tty_parse_modes: 53 1 debug3: tty_parse_modes: 54 1 debug3: tty_parse_modes: 55 1 debug3: tty_parse_modes: 56 0 debug3: tty_parse_modes: 57 0 debug3: tty_parse_modes: 58 0 debug3: tty_parse_modes: 59 1 debug3: tty_parse_modes: 60 1 debug3: tty_parse_modes: 61 1 debug1: Ignoring unsupported tty mode opcode 62 (0x3e) debug3: tty_parse_modes: 70 1 debug3: tty_parse_modes: 71 0 debug3: tty_parse_modes: 72 1 debug3: tty_parse_modes: 73 0 debug3: tty_parse_modes: 74 0 debug3: tty_parse_modes: 75 0 debug3: tty_parse_modes: 90 1 debug3: tty_parse_modes: 91 1 debug3: tty_parse_modes: 92 0 debug3: tty_parse_modes: 93 0 debug1: server_input_channel_req: channel 1 request shell reply 0 debug1: session_by_channel: session 1 channel 1 debug1: session_input_channel_req: session 1 req shell debug2: channel 1: rfd 10 isatty debug2: fd 10 setting O_NONBLOCK debug2: fd 9 setting O_NONBLOCK debug2: channel 1: read<=0 rfd 10 len 0 debug2: channel 1: read failed debug2: channel 1: close_read debug2: channel 1: input open -> drain debug2: channel 1: ibuf empty debug2: channel 1: send eof debug2: channel 1: input drain -> closed debug1: Received SIGCHLD. debug1: session_by_pid: pid 2636 debug1: session_exit_message: session 1 channel 1 pid 2636 debug2: channel 1: request exit-status confirm 0 debug1: session_exit_message: release channel 1 debug2: channel 1: write failed debug2: channel 1: close_write debug2: channel 1: output open -> closed debug1: session_pty_cleanup: session 1 release /dev/tty6 debug2: channel 1: send close debug3: channel 1: will not send data after close debug2: notify_done: reading debug3: channel 1: will not send data after close debug2: channel 1: rcvd close debug3: channel 1: will not send data after close debug2: channel 1: is dead debug2: channel 1: gc: notify user debug1: session_by_channel: session 1 channel 1 debug1: session_close_by_channel: channel 1 child 0 debug1: session_close: session 1 pid 0 debug2: channel 1: gc: user detached debug2: channel 1: is dead debug2: channel 1: garbage collecting debug1: channel 1: free: server-session, nchannels 2 debug3: channel 1: status: The following connections are open: #0 server-session (t4 r0 i0/0 o0/0 fd 7/6 cfd -1) #1 server-session (t4 r1 i3/0 o3/0 fd -1/-1 cfd -1) debug3: channel 1: close_fds r -1 w -1 e -1 c -1 debug1: server_input_channel_open: ctype session rchan 1 win 131072 max 32768 debug1: input_session_request debug1: channel 1: new [server-session] debug1: session_new: session 1 debug1: session_open: channel 1 debug1: session_open: session 1: link with channel 1 debug1: server_input_channel_open: confirm session debug1: server_input_channel_req: channel 1 request exec reply 0 debug1: session_by_channel: session 1 channel 1 debug1: session_input_channel_req: session 1 req exec debug2: fd 11 setting O_NONBLOCK debug2: fd 10 setting O_NONBLOCK debug2: fd 13 setting O_NONBLOCK debug1: server_input_channel_open: ctype session rchan 2 win 131072 max 32768 debug1: input_session_request debug1: channel 2: new [server-session] debug1: session_new: session 2 debug1: session_open: channel 2 debug1: session_open: session 2: link with channel 2 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 3 win 131072 max 32768 debug1: input_session_request debug1: channel 3: new [server-session] debug1: session_new: session 3 debug1: session_open: channel 3 debug1: session_open: session 3: link with channel 3 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 4 win 131072 max 32768 debug1: input_session_request debug1: channel 4: new [server-session] debug1: session_new: session 4 debug1: session_open: channel 4 debug1: session_open: session 4: link with channel 4 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 5 win 131072 max 32768 debug1: input_session_request debug1: channel 5: new [server-session] debug1: session_new: session 5 debug1: session_open: channel 5 debug1: session_open: session 5: link with channel 5 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 6 win 131072 max 32768 debug1: input_session_request debug1: channel 6: new [server-session] debug1: session_new: session 6 debug1: session_open: channel 6 debug1: session_open: session 6: link with channel 6 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 7 win 131072 max 32768 debug1: input_session_request debug1: channel 7: new [server-session] debug1: session_new: session 7 debug1: session_open: channel 7 debug1: session_open: session 7: link with channel 7 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 8 win 131072 max 32768 debug1: input_session_request debug1: channel 8: new [server-session] debug1: session_new: session 8 debug1: session_open: channel 8 debug1: session_open: session 8: link with channel 8 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 9 win 131072 max 32768 debug1: input_session_request debug1: channel 9: new [server-session] debug1: session_new: session 9 debug1: session_open: channel 9 debug1: session_open: session 9: link with channel 9 debug1: server_input_channel_open: confirm session debug1: server_input_channel_open: ctype session rchan 10 win 131072 max 32768 debug1: input_session_request debug2: channel: expanding 20 debug1: channel 10: new [server-session] debug1: session_new: session 10 debug1: session_open: channel 10 debug1: session_open: session 10: link with channel 10 debug1: server_input_channel_open: confirm session debug1: server_input_channel_req: channel 2 request exec reply 0 debug1: session_by_channel: session 2 channel 2 debug1: session_input_channel_req: session 2 req exec debug2: fd 14 setting O_NONBLOCK debug2: fd 12 setting O_NONBLOCK debug2: fd 16 setting O_NONBLOCK debug1: server_input_channel_req: channel 3 request exec reply 0 debug1: session_by_channel: session 3 channel 3 debug1: session_input_channel_req: session 3 req exec debug2: fd 17 setting O_NONBLOCK debug2: fd 15 setting O_NONBLOCK debug2: fd 19 setting O_NONBLOCK debug1: server_input_channel_req: channel 4 request exec reply 0 debug1: session_by_channel: session 4 channel 4 debug1: session_input_channel_req: session 4 req exec debug2: fd 20 setting O_NONBLOCK debug2: fd 18 setting O_NONBLOCK debug2: fd 22 setting O_NONBLOCK debug1: server_input_channel_req: channel 5 request exec reply 0 debug1: session_by_channel: session 5 channel 5 debug1: session_input_channel_req: session 5 req exec debug1: Received SIGCHLD. debug1: Received SIGCHLD. debug2: fd 23 setting O_NONBLOCK debug2: fd 21 setting O_NONBLOCK debug2: fd 25 setting O_NONBLOCK debug1: server_input_channel_req: channel 6 request exec reply 0 debug1: session_by_channel: session 6 channel 6 debug1: session_input_channel_req: session 6 req exec debug1: Received SIGCHLD. debug1: Received SIGCHLD. debug2: fd 26 setting O_NONBLOCK debug2: fd 24 setting O_NONBLOCK debug2: fd 28 setting O_NONBLOCK debug1: server_input_channel_req: channel 7 request exec reply 0 debug1: session_by_channel: session 7 channel 7 debug1: session_input_channel_req: session 7 req exec debug2: fd 29 setting O_NONBLOCK debug2: fd 27 setting O_NONBLOCK fcntl(31, F_GETFL, 0): Bad file descriptor select: Bad file descriptor debug1: session_by_pid: pid 3792 debug1: session_exit_message: session 1 channel 1 pid 3792 debug2: channel 1: request exit-status confirm 0 debug1: session_exit_message: release channel 1 debug2: channel 1: write failed debug2: channel 1: close_write debug2: channel 1: output open -> closed debug1: session_by_pid: pid 576 debug1: session_exit_message: session 2 channel 2 pid 576 debug2: channel 2: request exit-status confirm 0 debug1: session_exit_message: release channel 2 debug2: channel 2: write failed debug2: channel 2: close_write debug2: channel 2: output open -> closed debug1: session_by_pid: pid 3520 debug1: session_exit_message: session 3 channel 3 pid 3520 debug2: channel 3: request exit-status confirm 0 debug1: session_exit_message: release channel 3 debug2: channel 3: write failed debug2: channel 3: close_write debug2: channel 3: output open -> closed debug1: session_by_pid: pid 3972 debug1: session_exit_message: session 4 channel 4 pid 3972 debug2: channel 4: request exit-status confirm 0 debug1: session_exit_message: release channel 4 debug2: channel 4: write failed debug2: channel 4: close_write debug2: channel 4: output open -> closed select: Bad file descriptor select: Bad file descriptor . . (50 times) . . select: Bad file descriptor select: Bad file descriptor debug1: Received SIGCHLD. debug1: session_by_pid: pid 3592 debug1: session_exit_message: session 5 channel 5 pid 3592 debug2: channel 5: request exit-status confirm 0 debug1: session_exit_message: release channel 5 debug2: channel 5: write failed debug2: channel 5: close_write debug2: channel 5: output open -> closed select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor . . (9000 times) . . select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor select: Bad file descriptor Exiting on signal 15 debug1: do_cleanup debug1: session_pty_cleanup: session 0 release /dev/tty5
Goldburt, Dan
2006-Sep-08 16:16 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
>debug1: server_input_channel_req: channel 7 request exec reply 0 >debug1: session_by_channel: session 7 channel 7 >debug1: session_input_channel_req: session 7 req exec >debug2: fd 29 setting O_NONBLOCK >debug2: fd 27 setting O_NONBLOCK >fcntl(31, F_GETFL, 0): Bad file descriptor >select: Bad file descriptorHi, Ok so I think I tracked this down a bit more. What gave it away was that it had trouble opening fd number 31. I mentioned that I increased MAX_SESSIONS in session.c (line 107). Before I did this, sshd was only getting up to fd 30 (10 sessions * 3 fd per session - stdin, stdout, stderr). So I'm wondering if there is another dependant variable that I need to change? Perhaps somewhere where memory is allocated for the fd? Thanks, Dan
Darren Tucker
2006-Sep-08 16:20 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Goldburt, Dan wrote:> Hello, > > I need to make many (>50) ssh connections from linux to cygwin at the > same time. Using Windows 2000 Server (OpenSSH_4.3p2, OpenSSL 0.9.8b > and updated cygwin) and Linux RHEL4 (OpenSSH_3.9p1, OpenSSL 0.9.7a).It's not pretty but you could run multiple sshd's on several ports.> It's been difficult to optimize many simultaneous connections. Here > were some issues: > 1. On Windows XP/Professional, Microsoft > intentionally cripples the TCP/IP stack. Official word > (http://support.microsoft.com/kb/Q127144) is that the backlog queue > limit on a listen socket is 5 (200 when Server), so you can't > accept() more than 5 new connections concurrently.> 2. Using a master connection that is shared, the sshd_config variable > MaxStartups has no effect. This is because we are not opening lots > of ssh connections, but are opening multiple sessions within a single > connection. The parameter that needs to be changed is MAX_SESSIONS, > which is hardcoded in sessions.c at 10. Request: add "MAX_SESSIONS" > as a configuration parameter in sshd_config.Maybe. It's certainly too late for the upcoming release, though.> (also, you should > mention in INSTALL documentation that by default, compiled binaries > are quite a bit larger than usual. Do you use strip -strip-debug?We use whatever "install -s" uses on your platform. If you're using the bundled install-sh script then it just calls "strip".> Finally, I'm able to make many connections most of the time. But > then, sshd errors: > > fcntl(223, F_GETFL, 0): Bad file descriptor and sometimes: [sig] bash > 3720 _cygtls::handle_threadlist_exception and then loops spitting out > "select: Bad file descriptor" and taking up 100% CPU. I have not done > a stack trace or increased sshd debug output because the error comes > up when about 100 connections are made, so it would be difficult to > track down. If this isn't enough information to go on, I will post > it.Now this I'm not sure about. You'll have the stdout and stderr descriptors in the select's readset, which for FD_SETSIZE=64 puts the limit at around 30 connections or so (assuming you're not port forwarding or something too). What did you bump MAX_SESSIONS to? It might be overrunning the fd_set. To make this work, you would probably need to break the select into FD_SETSIZE chunks somehow.> Is this because on Cygwin, the "fd_set" arrays, used with select(), > can contain file descriptors (FD) from 0 to 63 (the fd_set array is > 8-byte long). On Linux, this is 0 to 1023? From: > http://www.ipflow.utc.fr/blog/?p=34 and > http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=105321853321894&w=3 > I also found a post > http://www.cygwin.com/ml/cygwin/2005-06/msg00511.html where Corinna > said "Using a master/slave connection requires the ability to > exchange file descriptors over AF_UNIX sockets. That's not possible > in Cygwin." I assume this has been addressed with USE_PIPES, since I > AM able to use multiplexed connections most of the time?That refers to the multiplexing (ControlMaster/ControlPath) functionality in the client, not the server side.> Lastly, if security is not the biggest concern, should I even use > ssh? I just need to be able to execute many remote shell commands in > a short interval and return the output.That's a local policy decision, but you're probably not going to get an unbiased opinion on this list :-) -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Goldburt, Dan
2006-Sep-08 16:42 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Tucker, Darren wrote:> > You'll have the stdout and stderr descriptors in the > select's readset, > which for FD_SETSIZE=64 puts the limit at around 30 > connections or so > (assuming you're not port forwarding or something too). > What did you > bump MAX_SESSIONS to? It might be overrunning the fd_set.I set MAX_SESSIONS to 128. I think I'm definitely overrunning the fd_set. Running the test again with a just started sshd instance, I get the error fcntl(31, F_GETFL, 0). So the limit seems to be 30 fds, or 10 connections (3 fd per connection). Where is FD_SETSIZE set in cygwin? Any chance this can be bumped up?> > To make this work, you would probably need to break the > select into > FD_SETSIZE chunks somehow. >I'm in over my head! How do I do that? Is this something that can be changed in the sshd code? As far as I can tell, sshd calls fcntl(fd, F_GETFL, 0), not select() directly. To be honest I have little idea as to how file descriptors work, and what the fd_set that I am over-running is. I'm going to do some research.
Goldburt, Dan
2006-Sep-11 12:22 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Hi, Ok, so I'm thinking about taking Darren's suggestion:> It's not pretty but you could run multiple sshd's on several ports.But before I do, I was hoping to get some help in optimizing the fix. 1. What does the cygwin limitation bound my max sessions to? Is it: a) 30> You'll have the stdout and stderr descriptors in the > select's readset, > which for FD_SETSIZE=64 puts the limit at around 30 > connections or so > (assuming you're not port forwarding or something too).b) 20> Thinking about it, that's wrong (I was thinking of poll). > Since select > uses bitmasks it doesn't matter how many are in each of > the readset and > writeset so the limit would be around 20 concurrent.or c) 10> I think I'm definitely overrunning the fd_set. Running the > test again with a just started sshd instance, I get the > error fcntl(31, F_GETFL, 0). So the limit seems to be 30 > fds, or 10 connections (3 fd per connection).2. I need to make sure if I do still accidentally overrun the fd_set, I will not crash sshd. Right now it goes into an infinite loop spitting out "select: Bad file descriptor" and taking up 100% CPU. Surely this is a bug that needs to be patched? 3. Any chance I can overcome the limitation from inside sshd? How do I implement the following:> To make this work, you would probably need to break the > select into FD_SETSIZE chunks somehow.
Goldburt, Dan
2006-Sep-11 12:22 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Hi, Ok, so I'm thinking about taking Darren's suggestion:> It's not pretty but you could run multiple sshd's on several ports.But before I do, I was hoping to get some help in optimizing the fix. 1. What does the cygwin limitation bound my max sessions to? Is it: a) 30> You'll have the stdout and stderr descriptors in the > select's readset, > which for FD_SETSIZE=64 puts the limit at around 30 > connections or so > (assuming you're not port forwarding or something too).b) 20> Thinking about it, that's wrong (I was thinking of poll). > Since select > uses bitmasks it doesn't matter how many are in each of > the readset and > writeset so the limit would be around 20 concurrent.or c) 10> I think I'm definitely overrunning the fd_set. Running the > test again with a just started sshd instance, I get the > error fcntl(31, F_GETFL, 0). So the limit seems to be 30 > fds, or 10 connections (3 fd per connection).2. I need to make sure if I do still accidentally overrun the fd_set, I will not crash sshd. Right now it goes into an infinite loop spitting out "select: Bad file descriptor" and taking up 100% CPU. Surely this is a bug that needs to be patched? 3. Any chance I can overcome the limitation from inside sshd? How do I implement the following:> To make this work, you would probably need to break the > select into FD_SETSIZE chunks somehow._______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Darren Tucker
2006-Sep-11 22:21 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Goldburt, Dan wrote:> Hi, > > Ok, so I'm thinking about taking Darren's suggestion: >> It's not pretty but you could run multiple sshd's on several ports. > > But before I do, I was hoping to get some help in optimizing the fix. > > 1. What does the cygwin limitation bound my max sessions to? > Is it: > a) 30 >> You'll have the stdout and stderr descriptors in the >> select's readset, >> which for FD_SETSIZE=64 puts the limit at around 30 >> connections or so >> (assuming you're not port forwarding or something too). > b) 20 >> Thinking about it, that's wrong (I was thinking of poll). >> Since select >> uses bitmasks it doesn't matter how many are in each of >> the readset and >> writeset so the limit would be around 20 concurrent. > or c) 10 >> I think I'm definitely overrunning the fd_set. Running the >> test again with a just started sshd instance, I get the >> error fcntl(31, F_GETFL, 0). So the limit seems to be 30 >> fds, or 10 connections (3 fd per connection).I'm not sure, actually. You seem to be hitting some limit at 31 descriptors before the fd_set one, which should be at 64 (3 per session = ~20 concurrent). What does "ulimit -n" report the descriptor limit as, and do you have some local processes using some of them?> 2. I need to make sure if I do still accidentally overrun the fd_set, I > will not crash sshd. Right now it goes into an infinite loop spitting > out "select: Bad file descriptor" and taking up 100% CPU. Surely this is > a bug that needs to be patched?Maybe, but it only occurs with modified code, right?> 3. Any chance I can overcome the limitation from inside sshd? How do I > implement the following: >> To make this work, you would probably need to break the >> select into FD_SETSIZE chunks somehow.I was thinking of overloading select an associated macros in the compat library but it's probably not trivial. Damien said that the fd_sets were dynamically allocated but I'm not sure how that helps in the case where there's more than FD_SETSIZE descriptors. -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement. _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Goldburt, Dan
2006-Sep-12 14:38 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
> Darren Tucker wrote: > > Goldburt, Dan wrote: > > > > 1. What does the cygwin limitation bound my max sessions > to? > > I'm not sure, actually. You seem to be hitting some limit > at 31 > descriptors before the fd_set one, which should be at 64 > (3 per session > = ~20 concurrent). What does "ulimit -n" report the > descriptor limit > as, and do you have some local processes using some of > them?ulimit -n on cygwin reports 256 open files max. Is there a per-process limit? I'm thinking specifically about setdtablesize() (see http://sourceware.org/ml/cygwin/2000-09/msg00286.html)> > 2. I need to make sure if I do still accidentally > overrun the fd_set, I > > will not crash sshd. Right now it goes into an infinite > loop spitting > > out "select: Bad file descriptor" and taking up 100% > CPU. Surely this is > > a bug that needs to be patched? > > Maybe, but it only occurs with modified code, right?Not necessarily. The only modification was to increase MAX_SESSIONS per connection from 10 to 128. But even without the change, let's say I have one multiplexed connection that is hosting 10 sessions. I can also simultaneously open 15 regular ssh connections to have 25 sessions opening up, and there is a good chance I will overrun the fd_set.> > 3. Any chance I can overcome the limitation from inside > sshd? How do I > > implement the following: > >> To make this work, you would probably need to break the > >> select into FD_SETSIZE chunks somehow. > > I was thinking of overloading select an associated macros > in the compat > library but it's probably not trivial.Kudos to anyone who attempts that. It would make for a very robust solution, IMHO. Even better would be for this change to be made in the Cygwin select() code (that is what sshd is using, correct? Also, there seems to be some confusion whether winsock's select() implementation is being used in cygwin or not - see http://sourceware.org/ml/cygwin/1999-12/msg00149.html).> Damien said that > the fd_sets > were dynamically allocated but I'm not sure how that helps > in the case > where there's more than FD_SETSIZE descriptors.I'm not sure either. What does he mean by dynamically allocated? I see in serverloop.c (lines 638 - 642):> max_fd = MAX(connection_in, connection_out); > max_fd = MAX(max_fd, fdin); > max_fd = MAX(max_fd, fdout); > max_fd = MAX(max_fd, fderr); > max_fd = MAX(max_fd, notify_pipe[0]);which dynamically increments the number of file descriptors to select on. The next lines (655 and 645) use this value:> /* Sleep in select() until we can do something. */ > wait_until_can_do_something(&readset, &writeset, &max_fd, > &nalloc, max_time_milliseconds);and this is where (I think/haven't proved) the bug lives. As soon as max_fd exceeds FD_SETSIZE, we try to select on a fd outside of the cygwin supported array size. This accounts for the "select: Bad file descriptor" error, but more importantly since the fd we are actually interested in lies outside of the selectable range, the returned bitmask will never change and we will never break out of the "sleep in select loop". (see http://sourceware.org/ml/cygwin/1999-11/msg00451.html) I think this is a serious bug on, can somebody please vet my analysis? As far as I can tell, max_fd should be bounded to FD_SETSIZE under cygwin.
Goldburt, Dan
2006-Sep-12 14:38 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
> Darren Tucker wrote: > > Goldburt, Dan wrote: > > > > 1. What does the cygwin limitation bound my max sessions > to? > > I'm not sure, actually. You seem to be hitting some limit > at 31 > descriptors before the fd_set one, which should be at 64 > (3 per session > = ~20 concurrent). What does "ulimit -n" report the > descriptor limit > as, and do you have some local processes using some of > them?ulimit -n on cygwin reports 256 open files max. Is there a per-process limit? I'm thinking specifically about setdtablesize() (see http://sourceware.org/ml/cygwin/2000-09/msg00286.html)> > 2. I need to make sure if I do still accidentally > overrun the fd_set, I > > will not crash sshd. Right now it goes into an infinite > loop spitting > > out "select: Bad file descriptor" and taking up 100% > CPU. Surely this is > > a bug that needs to be patched? > > Maybe, but it only occurs with modified code, right?Not necessarily. The only modification was to increase MAX_SESSIONS per connection from 10 to 128. But even without the change, let's say I have one multiplexed connection that is hosting 10 sessions. I can also simultaneously open 15 regular ssh connections to have 25 sessions opening up, and there is a good chance I will overrun the fd_set.> > 3. Any chance I can overcome the limitation from inside > sshd? How do I > > implement the following: > >> To make this work, you would probably need to break the > >> select into FD_SETSIZE chunks somehow. > > I was thinking of overloading select an associated macros > in the compat > library but it's probably not trivial.Kudos to anyone who attempts that. It would make for a very robust solution, IMHO. Even better would be for this change to be made in the Cygwin select() code (that is what sshd is using, correct? Also, there seems to be some confusion whether winsock's select() implementation is being used in cygwin or not - see http://sourceware.org/ml/cygwin/1999-12/msg00149.html).> Damien said that > the fd_sets > were dynamically allocated but I'm not sure how that helps > in the case > where there's more than FD_SETSIZE descriptors.I'm not sure either. What does he mean by dynamically allocated? I see in serverloop.c (lines 638 - 642):> max_fd = MAX(connection_in, connection_out); > max_fd = MAX(max_fd, fdin); > max_fd = MAX(max_fd, fdout); > max_fd = MAX(max_fd, fderr); > max_fd = MAX(max_fd, notify_pipe[0]);which dynamically increments the number of file descriptors to select on. The next lines (655 and 645) use this value:> /* Sleep in select() until we can do something. */ > wait_until_can_do_something(&readset, &writeset, &max_fd, > &nalloc, max_time_milliseconds);and this is where (I think/haven't proved) the bug lives. As soon as max_fd exceeds FD_SETSIZE, we try to select on a fd outside of the cygwin supported array size. This accounts for the "select: Bad file descriptor" error, but more importantly since the fd we are actually interested in lies outside of the selectable range, the returned bitmask will never change and we will never break out of the "sleep in select loop". (see http://sourceware.org/ml/cygwin/1999-11/msg00451.html) I think this is a serious bug on, can somebody please vet my analysis? As far as I can tell, max_fd should be bounded to FD_SETSIZE under cygwin. _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Darren Tucker
2006-Sep-12 14:49 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
On Tue, Sep 12, 2006 at 10:38:36AM -0400, Goldburt, Dan wrote:> Not necessarily. The only modification was to increase MAX_SESSIONS per > connection from 10 to 128. But even without the change, let's say I have > one multiplexed connection that is hosting 10 sessions. I can also > simultaneously open 15 regular ssh connections to have 25 sessions > opening up, and there is a good chance I will overrun the fd_set.No, the file descriptor table (and thus fd_set) is per-process and each SSH connection has its own sshd process handling it. (It's a bit late here to go into the rest of your mail, sorry.) -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement. _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Goldburt, Dan
2006-Sep-13 22:17 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Hi, To recap, I'm establishing one master ssh connection and am opening many session through that one master connection. Often I get "select: Bad file descriptor errors" and the server thrashes at 100% CPU. The symptoms are very similar to those in this post: http://sourceware.org/ml/cygwin/2001-09/msg01217.html But the solution there doesn't work for OpenSSH. Initially I believed that the number of file descriptors being opened were overrunning the fd_set. But it doesn't seem like I'm overrunning the FD_SETSIZE. On the server, ulimit -n return 256, and I also tried the following. Darren Tucker wrote:> BTW did you try bumping FD_SETSIZE when configuring > OpenSSH with your > increased MAX_SESSIONS? > eg: ./configure --with-cflags=-DFD_SETSIZE=256 >I'm getting select() errors randomly, sometimes selecting up to file descriptor 31, sometimes up to 38, sometimes up to 191, but sometimes I don't get any errors (even for over 80 simultaneous sessions.) But once it happens once, every subsequent select will fail (looping and thrashing the server). This seems to me to be a serious bug. Yes, I did increase MAX_SESSIONS from 10 to 128, but that just made it easier to generate the error. You can also reproduce it if you install sshd to listen on several ports (start it with multiple -p arguments), and on each port establish a multiplexed connection with many sessions. In any case, shouldn't OpenSSH somehow handle the EBADF? The offending code is in serverloop.c, line 332: ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp); I tried setting *maxfdp to FD_SETSIZE (as suggested in the post above), but then I would get EBADF every single time. I also tried setting *maxfdp to something small like 30, but then select would always come back with 0 because the fd it was interested in was greater than 30. The unix manpage defines "EBADF: One or more of the file descriptor sets specified a file descriptor that is not a valid open file descriptor." (http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+FD_SET). I modified the code to handle the EBADF error. An example of what I'm printing right now is "select: EBADF (bad file descriptor), maxfdp=38 FD_SETSIZE=256 readsetp=4 writesetp=4". Here is the code following the select: ret = select(..); if (ret == -1) { memset(*readsetp, 0, *nallocp); memset(*writesetp, 0, *nallocp); if (errno == EBADF) { error("select: EBADF (bad file descriptor), maxfdp=%d FD_SETSIZE=%d readsetp=%d writesetp=%d", (*maxfdp), FD_SETSIZE, sizeof(*readsetp), sizeof(*writesetp)); fatal("Bad file descriptor loop"); } else if (errno != EINTR) { error("select: %.100s", strerror(errno)); } } . . How can I print something to best debug the problem? Anybody have a best guess why I'm getting EBADF? Was a file descriptor unexpectedly closed but we are still trying to select on it? _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org http://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Goldburt, Dan
2006-Sep-13 22:17 UTC
Multiple (multiplexed) simultaneous ssh connections - Cygwin bug?
Hi, To recap, I'm establishing one master ssh connection and am opening many session through that one master connection. Often I get "select: Bad file descriptor errors" and the server thrashes at 100% CPU. The symptoms are very similar to those in this post: http://sourceware.org/ml/cygwin/2001-09/msg01217.html But the solution there doesn't work for OpenSSH. Initially I believed that the number of file descriptors being opened were overrunning the fd_set. But it doesn't seem like I'm overrunning the FD_SETSIZE. On the server, ulimit -n return 256, and I also tried the following. Darren Tucker wrote:> BTW did you try bumping FD_SETSIZE when configuring > OpenSSH with your > increased MAX_SESSIONS? > eg: ./configure --with-cflags=-DFD_SETSIZE=256 >I'm getting select() errors randomly, sometimes selecting up to file descriptor 31, sometimes up to 38, sometimes up to 191, but sometimes I don't get any errors (even for over 80 simultaneous sessions.) But once it happens once, every subsequent select will fail (looping and thrashing the server). This seems to me to be a serious bug. Yes, I did increase MAX_SESSIONS from 10 to 128, but that just made it easier to generate the error. You can also reproduce it if you install sshd to listen on several ports (start it with multiple -p arguments), and on each port establish a multiplexed connection with many sessions. In any case, shouldn't OpenSSH somehow handle the EBADF? The offending code is in serverloop.c, line 332: ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp); I tried setting *maxfdp to FD_SETSIZE (as suggested in the post above), but then I would get EBADF every single time. I also tried setting *maxfdp to something small like 30, but then select would always come back with 0 because the fd it was interested in was greater than 30. The unix manpage defines "EBADF: One or more of the file descriptor sets specified a file descriptor that is not a valid open file descriptor." (http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+FD_SET). I modified the code to handle the EBADF error. An example of what I'm printing right now is "select: EBADF (bad file descriptor), maxfdp=38 FD_SETSIZE=256 readsetp=4 writesetp=4". Here is the code following the select: ret = select(..); if (ret == -1) { memset(*readsetp, 0, *nallocp); memset(*writesetp, 0, *nallocp); if (errno == EBADF) { error("select: EBADF (bad file descriptor), maxfdp=%d FD_SETSIZE=%d readsetp=%d writesetp=%d", (*maxfdp), FD_SETSIZE, sizeof(*readsetp), sizeof(*writesetp)); fatal("Bad file descriptor loop"); } else if (errno != EINTR) { error("select: %.100s", strerror(errno)); } } . . How can I print something to best debug the problem? Anybody have a best guess why I'm getting EBADF? Was a file descriptor unexpectedly closed but we are still trying to select on it?
Reasonably Related Threads
- X11 forwarding problem -- openssh-3.5p1 -- redhat 8.0 -- linux 2.4.18
- [Bug 296] Priv separation does not work on OSF/1
- X11 forwarding does not work as normal user
- sshd terminates a session after a successful login
- [Bug 333] X11 forwarding not working in OpenSSH 3.4p1