Doug Graham
2019-Sep-16 14:06 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> Case in point; EAGAIN can come if you give your fd to another process > and continue using it yourself.> Short counts; It is documented behavior that read() and write() may > return short counts. It is not documented why, so you can not make > any assumptions.You might be right about short counts but if you're right about EAGAIN, there are bugs everywhere. My first attempt at working around my "make: write error" failure was to pipe make into cat or tee, eg: "make | tee make.log". But that caused both cat and tee to fail with EAGAIN. So they have the same "bug" as make. Also note that make is just calling printf normally and then just before exiting, it calls ferror(stdout) to see if any error occurred when it previously wrote to stdout. ferror() is returning true. So now the bug has moved into the C library. Also note that EAGAIN is not a transient error like EINTR that will probably go away on a retry. Retrying the write in a tight loop would probably just burn some extra cpu and then fail anyway. You'd have to call select() first, or put a delay in the loop. Are you suggesting that every program that writes to stdout should implement such contortions? Not to mention that if the error occurred in stdout or some other C library routine, I don't think the calling program has any way of telling how much output was sent successfully and how much should be retried. I could write if (printf("hello world") < 0 && errno == EAGAIN) <what here?> but can I safely assume that none of my string was written before the error occurred?
Alex Bligh
2019-Sep-16 21:12 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> On 16 Sep 2019, at 16:06, Doug Graham <edougra at gmail.com> wrote: > >> Case in point; EAGAIN can come if you give your fd to another process >> and continue using it yourself. > >> Short counts; It is documented behavior that read() and write() may >> return short counts. It is not documented why, so you can not make >> any assumptions. > > You might be right about short counts but if you're right about > EAGAIN, there are > bugs everywhere. My first attempt at working around my "make: write error" > failure was to pipe make into cat or tee, eg: "make | tee make.log". But that > caused both cat and tee to fail with EAGAIN. So they have the same "bug" as > make. Also note that make is just calling printf normally and then > just before exiting, > it calls ferror(stdout) to see if any error occurred when it > previously wrote to stdout. > ferror() is returning true. So now the bug has moved into the C library.Dumb question: shouldn't whatever is calling fork() (here 'make' I believe) be dup()'ing the FDs just in case the called program does something odd with them like set O_NONBLOCK? That's what I've always done before fork(), and I believe what the venerable Mr Stevens recommends. -- Alex Bligh
Doug Graham
2019-Sep-16 23:41 UTC
ssh client is setting O_NONBLOCK on a pipe shared with other processes
> Dumb question: shouldn't whatever is calling fork() (here 'make' I believe) > be dup()'ing the FDs just in case the called program does something odd > with them like set O_NONBLOCK?Doesn't work. That's where I think POSIX is downright weird. Ssh actually does dup descriptors 0, 1, and 2, and then only sets O_NONBLOCK on the new descriptors: if (stdin_null_flag) { in = open(_PATH_DEVNULL, O_RDONLY); } else { in = dup(STDIN_FILENO); } out = dup(STDOUT_FILENO); err = dup(STDERR_FILENO); /* enable nonblocking unless tty */ if (!isatty(in)) set_nonblock(in); if (!isatty(out)) set_nonblock(out); if (!isatty(err)) set_nonblock(err); The problem, I've learned, it that the original descriptor and the new copy of it refer to the same POSIX "file description", so setting O_NONBLOCK on one affects the other as well. I find this quite counter-intuitive and it's why I'm starting to believe that the bug is in POSIX. I also find it odd a child process can affect this flag in the parent, but that again is because both the parent and child's descriptors refer to the same POSIX "file description".