Doug Graham
2019-Sep-16 03:24 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> ssh has to set NONBLOCK otherwise it can, well, block - there's > no way for ssh to know a priori how much data it can write to a fd.I don't know anything about how ssh is structured, but I think it must be a bit more complicated than that. Ssh only sets O_NONBLOCK on an fd if isatty(fd) returns false, so it's able to function with blocking input and output if the relevant descriptor refers to a tty (probably the usual case). On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm at mindrot.org> wrote:> > On Sun, 15 Sep 2019, Doug Graham wrote: > > > The quick summary is that we invoke git from a parallel invocation of > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During > > our build, stderr refers to a pipe that other jobs run by make (and > > make itself) may also write to, and since this is a parallel build, > > they may write to that pipe while ssh has it in non-blocking mode. > > > > Make occasionally gets an unexpected EAGAIN error and fails the build > > with the error message "make: write error". > > > > We have a workaround, but it seems to me that this could cause > > problems with other background uses of ssh too. Should ssh really be > > setting O_NONBLOCK if it is running non-interactively? > > ssh has to set NONBLOCK otherwise it can, well, block - there's > no way for ssh to know a priori how much data it can write to a fd. > > -d
Damien Miller
2019-Sep-16 03:36 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
AFAIK failing to set nonblock on ttys is a blurry mixture of legacy behaviour, hack and bug. It's legacy behaviour because it dates back to ssh 1.x when the ssh process interacted with at most a single tty. This is no longer the case due to connection multiplexing. It's a bug for the same reason - a ssh blocked on tty writes that is not able to service IO to other multiplexed connections is not behaving well. It's a hack because it may provide something approximating human- friendly behaviour when ssh is only interacting with a single tty. In this case, if the tty is behind then stalling on writes provides some backpressure to the peer. I'm not sure whether this has been properly analysed or even whether it makes sense these days. In any case, disabling nonblock for non-ttys is not what we want to do. -d On Sun, 15 Sep 2019, Doug Graham wrote:> > ssh has to set NONBLOCK otherwise it can, well, block - there's > > no way for ssh to know a priori how much data it can write to a fd. > > I don't know anything about how ssh is structured, but I think it must > be a bit more complicated than that. Ssh only sets O_NONBLOCK on an > fd if isatty(fd) returns false, so it's able to function with blocking input > and output if the relevant descriptor refers to a tty (probably the usual > case). > > > On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm at mindrot.org> wrote: > > > > On Sun, 15 Sep 2019, Doug Graham wrote: > > > > > The quick summary is that we invoke git from a parallel invocation of > > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets > > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During > > > our build, stderr refers to a pipe that other jobs run by make (and > > > make itself) may also write to, and since this is a parallel build, > > > they may write to that pipe while ssh has it in non-blocking mode. > > > > > > Make occasionally gets an unexpected EAGAIN error and fails the build > > > with the error message "make: write error". > > > > > > We have a workaround, but it seems to me that this could cause > > > problems with other background uses of ssh too. Should ssh really be > > > setting O_NONBLOCK if it is running non-interactively? > > > > ssh has to set NONBLOCK otherwise it can, well, block - there's > > no way for ssh to know a priori how much data it can write to a fd. > > > > -d >
Doug Graham
2019-Sep-16 03:58 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> In any case, disabling nonblock for non-ttys is not what we want to doOk, well I don't know what the solution is then. I think the real problem is probably that O_NOBLOCK applies to an "open description" rather than than to a descriptor, but that behavior is apparently mandated by POSIX. I think that when ssh sets O_NONBLOCK, that should have no effect on its parent process or any other process. So I guess the bug is in POSIX. On Sun, Sep 15, 2019 at 11:36 PM Damien Miller <djm at mindrot.org> wrote:> > AFAIK failing to set nonblock on ttys is a blurry mixture of legacy > behaviour, hack and bug. > > It's legacy behaviour because it dates back to ssh 1.x when the ssh > process interacted with at most a single tty. This is no longer the > case due to connection multiplexing. > > It's a bug for the same reason - a ssh blocked on tty writes that > is not able to service IO to other multiplexed connections is not > behaving well. > > It's a hack because it may provide something approximating human- > friendly behaviour when ssh is only interacting with a single tty. > In this case, if the tty is behind then stalling on writes provides > some backpressure to the peer. I'm not sure whether this has been > properly analysed or even whether it makes sense these days. > > In any case, disabling nonblock for non-ttys is not what we want to do. > > -d > > > On Sun, 15 Sep 2019, Doug Graham wrote: > > > > ssh has to set NONBLOCK otherwise it can, well, block - there's > > > no way for ssh to know a priori how much data it can write to a fd. > > > > I don't know anything about how ssh is structured, but I think it must > > be a bit more complicated than that. Ssh only sets O_NONBLOCK on an > > fd if isatty(fd) returns false, so it's able to function with blocking input > > and output if the relevant descriptor refers to a tty (probably the usual > > case). > > > > > > On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm at mindrot.org> wrote: > > > > > > On Sun, 15 Sep 2019, Doug Graham wrote: > > > > > > > The quick summary is that we invoke git from a parallel invocation of > > > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets > > > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During > > > > our build, stderr refers to a pipe that other jobs run by make (and > > > > make itself) may also write to, and since this is a parallel build, > > > > they may write to that pipe while ssh has it in non-blocking mode. > > > > > > > > Make occasionally gets an unexpected EAGAIN error and fails the build > > > > with the error message "make: write error". > > > > > > > > We have a workaround, but it seems to me that this could cause > > > > problems with other background uses of ssh too. Should ssh really be > > > > setting O_NONBLOCK if it is running non-interactively? > > > > > > ssh has to set NONBLOCK otherwise it can, well, block - there's > > > no way for ssh to know a priori how much data it can write to a fd. > > > > > > -d > >