Doug Graham
2019-Sep-15 15:15 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
The quick summary is that we invoke git from a parallel invocation of "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets O_NONBLOCK on stdout and stderr if they do not refer to a tty. During our build, stderr refers to a pipe that other jobs run by make (and make itself) may also write to, and since this is a parallel build, they may write to that pipe while ssh has it in non-blocking mode. Make occasionally gets an unexpected EAGAIN error and fails the build with the error message "make: write error". We have a workaround, but it seems to me that this could cause problems with other background uses of ssh too. Should ssh really be setting O_NONBLOCK if it is running non-interactively? For more details, please see the thread on the git mailing list at https://www.spinics.net/lists/git/msg365902.html. Thanks, Doug
Damien Miller
2019-Sep-16 02:20 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
On Sun, 15 Sep 2019, Doug Graham wrote:> The quick summary is that we invoke git from a parallel invocation of > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During > our build, stderr refers to a pipe that other jobs run by make (and > make itself) may also write to, and since this is a parallel build, > they may write to that pipe while ssh has it in non-blocking mode. > > Make occasionally gets an unexpected EAGAIN error and fails the build > with the error message "make: write error". > > We have a workaround, but it seems to me that this could cause > problems with other background uses of ssh too. Should ssh really be > setting O_NONBLOCK if it is running non-interactively?ssh has to set NONBLOCK otherwise it can, well, block - there's no way for ssh to know a priori how much data it can write to a fd. -d
Darren Tucker
2019-Sep-16 03:16 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
On Mon, 16 Sep 2019 at 01:18, Doug Graham <edougra at gmail.com> wrote: [...]> Make occasionally gets an unexpected EAGAIN error and fails the build > with the error message "make: write error".So the make process gets an EAGAIN on the write syscall and doesn't retry? That sounds like a bug in whatever make you're using, since that could potentially occur in other circumstances too. (Same goes for EWOULDBLOCK, as well as EINTR if you don't have restartable syscalls). -- Darren Tucker (dtucker at dtucker.net) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new) Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Doug Graham
2019-Sep-16 03:24 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> ssh has to set NONBLOCK otherwise it can, well, block - there's > no way for ssh to know a priori how much data it can write to a fd.I don't know anything about how ssh is structured, but I think it must be a bit more complicated than that. Ssh only sets O_NONBLOCK on an fd if isatty(fd) returns false, so it's able to function with blocking input and output if the relevant descriptor refers to a tty (probably the usual case). On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm at mindrot.org> wrote:> > On Sun, 15 Sep 2019, Doug Graham wrote: > > > The quick summary is that we invoke git from a parallel invocation of > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During > > our build, stderr refers to a pipe that other jobs run by make (and > > make itself) may also write to, and since this is a parallel build, > > they may write to that pipe while ssh has it in non-blocking mode. > > > > Make occasionally gets an unexpected EAGAIN error and fails the build > > with the error message "make: write error". > > > > We have a workaround, but it seems to me that this could cause > > problems with other background uses of ssh too. Should ssh really be > > setting O_NONBLOCK if it is running non-interactively? > > ssh has to set NONBLOCK otherwise it can, well, block - there's > no way for ssh to know a priori how much data it can write to a fd. > > -d
Doug Graham
2019-Sep-16 03:41 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> So the make process gets an EAGAIN on the write syscall and doesn't > retry? That sounds like a bug in whatever make you're using, since > that could potentially occur in other circumstances too.What other circumstances? EAGAIN means that something put the device into non-blocking mode, and normally, that should only happen if the program calling write had itself previously set O_NONBLOCK. I don't think programs that don't set O_NONBLOCK are required to handle EAGAIN or short counts. They *may* need to deal with EINTR but signals don't come out of nowhere either and many programs run in an environment where EINTR is also unexpected. We are using GNU make 3.81 but newer versions of gmake do the same thing: void close_stdout (void) { int prev_fail = ferror (stdout); int fclose_fail = fclose (stdout); if (prev_fail || fclose_fail) { if (fclose_fail) error (NILF, _("write error: %s"), strerror (errno)); else error (NILF, _("write error")); exit (EXIT_FAILURE); } } On Sun, Sep 15, 2019 at 11:16 PM Darren Tucker <dtucker at dtucker.net> wrote:> > On Mon, 16 Sep 2019 at 01:18, Doug Graham <edougra at gmail.com> wrote: > [...] > > Make occasionally gets an unexpected EAGAIN error and fails the build > > with the error message "make: write error". > > So the make process gets an EAGAIN on the write syscall and doesn't > retry? That sounds like a bug in whatever make you're using, since > that could potentially occur in other circumstances too. > > (Same goes for EWOULDBLOCK, as well as EINTR if you don't have > restartable syscalls). > > -- > Darren Tucker (dtucker at dtucker.net) > GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new) > Good judgement comes with experience. Unfortunately, the experience > usually comes from bad judgement.
Possibly Parallel Threads
- ssh client is setting O_NONBLOCK on a pipe shared with other processes
- ssh client is setting O_NONBLOCK on a pipe shared with other processes
- ssh client is setting O_NONBLOCK on a pipe shared with other processes
- [PATCH v3] virtio-rng: return available data with O_NONBLOCK
- [PATCH v3] virtio-rng: return available data with O_NONBLOCK