Damien> 8.9 switch from select() to poll() and included a couple of bugs that > could cause weird problems. IMO you should try to port to what's on > the top of the V_9_0 git branch, which is 9.0 + one more poll()- > related fix.I just tried it with the head (commit bedb93415b) of the master branch. Unfortunately, I'm still seeing the issue. debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 debug1: adjusted rlen: 36438016, len: 32768 for 0x5609805b0ed0 debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 debug1: adjusted rlen: 36470784, len: 32768 for 0x5609805b0ed0 debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 debug1: adjusted rlen: 36503552, len: 32768 for 0x5609805b0ed0 Chris
rsbecker at nexbridge.com
2022-May-19 16:11 UTC
Difference in buffer behaviour between 8.89 and 8.9?
On May 19, 2022 11:23 AM Chris wrote:>Damien > >> 8.9 switch from select() to poll() and included a couple of bugs that >> could cause weird problems. IMO you should try to port to what's on >> the top of the V_9_0 git branch, which is 9.0 + one more poll()- >> related fix. > >I just tried it with the head (commit bedb93415b) of the master branch. >Unfortunately, I'm still seeing the issue. > >debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 >debug1: adjusted rlen: 36438016, len: 32768 for 0x5609805b0ed0 >debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 >debug1: adjusted rlen: 36470784, len: 32768 for 0x5609805b0ed0 >debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 >debug1: adjusted rlen: 36503552, len: 32768 for 0x5609805b0ed0I am watching this situation also. We are unable to get OpenSSH to work on NonStop since select was replaced with poll - mostly because poll is no longer on the platform. It is emulated only, so we are getting an "Illegal Function Argument" reported to the remote, which reports it back. Hopefully your resolution will also work for us. --Randall
I think I have a fix, or at least a better band aid, for people in my situation. If, at some point, you decide to increase the size of CHAN_SES_WINDOW_DEFAULT this may be helpful. Basically, the changes to channels.h increases the size of SSH's receive buffer, which triggers this behaviour. The changes in sshbuf.c tests to see if the buffer is larger than ssh buffer that seems to be handling the incoming packets. The assumption is that if it's larger then it's the receive buffer. I then aggressively grow that buffer (4MB at a time) but no more buf->max_size. The debug statement is so you can see what's going on but that should be removed for production. If you comment out the section that grows the window aggressively you can see the stall as the buffer grows 32k at a time. I have some concerns about growing the buffer it such large chunks. Also, I'd ideally make it a function of the size of c->local_window_max but I can't seem to get an extern working. Chris This patch is against the head of the master branch of openssh-portable. diff --git a/channels.h b/channels.h index 828c1b61b..ae897680c 100644 --- a/channels.h +++ b/channels.h @@ -210,7 +210,7 @@ struct Channel { /* default window/packet sizes for tcp/x11-fwd-channel */ #define CHAN_SES_PACKET_DEFAULT (32*1024) -#define CHAN_SES_WINDOW_DEFAULT (64*CHAN_SES_PACKET_DEFAULT) +#define CHAN_SES_WINDOW_DEFAULT (4096*CHAN_SES_PACKET_DEFAULT) #define CHAN_TCP_PACKET_DEFAULT (32*1024) #define CHAN_TCP_WINDOW_DEFAULT (64*CHAN_TCP_PACKET_DEFAULT) #define CHAN_X11_PACKET_DEFAULT (16*1024) diff --git a/sshbuf.c b/sshbuf.c index 840b899b1..b45720e1f 100644 --- a/sshbuf.c +++ b/sshbuf.c @@ -27,6 +27,7 @@ #include "ssherr.h" #include "sshbuf.h" #include "misc.h" +#include "log.h" static inline int sshbuf_check_sanity(const struct sshbuf *buf) @@ -327,9 +328,27 @@ sshbuf_allocate(struct sshbuf *buf, size_t len) */ need = len + buf->size - buf->alloc; rlen = ROUNDUP(buf->alloc + need, SSHBUF_SIZE_INC); + /* I think the receive buffer is the one that is + * growing slowly. This buffer quickly ends up being larger + * than the typical packet buffer (usually 32.25KB) so if + * we explicitly test for growth needs larger than that we + * can accelerate the growth of this buffer and reduce load + * the CPU and improve throughput. Ideally we would use + * (local_window_max - rlen) as need but we don't have access + * to that here */ + if (rlen > 34*1024) { + need = 4 * 1024 * 1024; + rlen = ROUNDUP(buf->alloc + need, SSHBUF_SIZE_INC); + if (rlen > buf->max_size) + rlen = buf->max_size; + } SSHBUF_DBG(("need %zu initial rlen %zu", need, rlen)); if (rlen > buf->max_size) rlen = buf->alloc + need; + /* be sure to include log.h if you uncomment the debug + * this debug helped identify the buffer growth issue in v8.9 + * see the git log about it. search for sshbuf_read */ + debug("adjusted rlen: %zu, len: %lu for %p", rlen, len, buf); SSHBUF_DBG(("adjusted rlen %zu", rlen)); if ((dp = recallocarray(buf->d, buf->alloc, rlen, 1)) == NULL) { SSHBUF_DBG(("realloc fail")); @@ -401,4 +420,3 @@ sshbuf_consume_end(struct sshbuf *buf, size_t len) SSHBUF_TELL("done"); return 0; } On 5/19/22 11:23 AM, rapier wrote:> Damien > >> 8.9 switch from select() to poll() and included a couple of bugs that >> could cause weird problems. IMO you should try to port to what's on >> the top of the V_9_0 git branch, which is 9.0 + one more poll()- >> related fix. > > I just tried it with the head (commit bedb93415b) of the master branch. > Unfortunately, I'm still seeing the issue. > > debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 > debug1: adjusted rlen: 36438016, len: 32768 for 0x5609805b0ed0 > debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 > debug1: adjusted rlen: 36470784, len: 32768 for 0x5609805b0ed0 > debug1: adjusted rlen: 33024, len: 32788 for 0x5609805a3eb0 > debug1: adjusted rlen: 36503552, len: 32768 for 0x5609805b0ed0 > > Chris