Jonathan Kamens
2002-May-28 07:26 UTC
rsync 2.5.4 (probably 2.5.5 too) server handles SIGPIPE very poorly
(I am not on the rsync mailing list, so if you send a response to this message to the list, please be sure to CC me.) I first reported this bug go Red Hat in <URL:https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=65350>. If you run rsync with a subshell through ssh.com's ssh and sshd and then kill the client with ctrl-C, the rsync server process running on the remote machine grows without bound and eventually completely hoses the machine. After some debugging, I determined this to be the problem (as reported in the Red Hat bug ticket noted above): When the client rsync and ssh exit, and thus sshd on the other end exits, and then the server rsync tries to write to the client, it gets SIGPIPE. Alas, SIGPIPE is being ignored, because, quoting a comment in the rsync source code, "Ignore SIGPIPE; we consistently check error codes and will see the EPIPE." The comment is wrong; it does *not* see the SIGPIPE. What happens is that as a result of the SIGPIPE, exit_cleanup gets called. That's a macro which calls _exit_cleanup. That calls log_exit That calls rwrite. Rwrite tries to write an error to stderr, but that fails because of (you guessed it!) SIGPIPE, and so rwrite calls exit_cleanup. Presto, an infinite loop is born. The most straightforward fix I came up with is to modify rwrite so that it doesn't actually try to write the error if the FILE to which the error has been written is showing a true error condition (checked with ferror). I will attach a patch. I suspect that the rsync server running under openssh doesn't have this problem because openssh sshd causes it it to get a SIGHUP whereas the ssh.com version of sshd does not. That's just a guess though. Here's the patch I submitted: *** log.c~ Mon Feb 18 14:51:12 2002 --- log.c Tue May 28 10:05:30 2002 *************** *** 276,284 **** if (!f) exit_cleanup(RERR_MESSAGEIO); ! if (fwrite(buf, len, 1, f) != 1) exit_cleanup(RERR_MESSAGEIO); ! if (buf[len-1] == '\r' || buf[len-1] == '\n') fflush(f); } --- 276,286 ---- if (!f) exit_cleanup(RERR_MESSAGEIO); ! if (! ferror(f)) { ! if (fwrite(buf, len, 1, f) != 1) exit_cleanup(RERR_MESSAGEIO); ! if (buf[len-1] == '\r' || buf[len-1] == '\n') fflush(f); ! } } This strikes me as a pretty serious bug; I hope you will consider releasing a patch release with this fix (or an equivalent one, if you don't like how I fixed it) soon. Thank you, Jonathan Kamens
Colin Walters
2002-Jun-02 23:34 UTC
rsync 2.5.4 (probably 2.5.5 too) server handles SIGPIPE very poorly
[resent by mbp] On Tue, 2002-05-28 at 10:22, Jonathan Kamens wrote:> If you run rsync with a subshell through ssh.com's ssh and sshd and > then kill the client with ctrl-C, the rsync server process running on > the remote machine grows without bound and eventually completely hoses > the machine.I feel some responsiblity for this, because it was my patch (to solve another, but similar problem) which caused rsync to ignore SIGPIPE.> When the client rsync and ssh exit, and thus sshd on the other end > exits, and then the server rsync tries to write to the client, it > gets SIGPIPE. Alas, SIGPIPE is being ignored, because, quoting a > comment in the rsync source code, "Ignore SIGPIPE; we consistently > check error codes and will see the EPIPE." > > The comment is wrong; it does *not* see the SIGPIPE.I assume you mean it doesn't see the EPIPE. SIGPIPE is ignored, so we won't see it.> What happens > is that as a result of the SIGPIPEYou mean as the result of the EPIPE return code.> exit_cleanup gets called. > That's a macro which calls _exit_cleanup. That calls log_exit That > calls rwrite. Rwrite tries to write an error to stderr, but that > fails because of (you guessed it!) SIGPIPE, and so rwrite calls > exit_cleanup. Presto, an infinite loop is born.This analysis appears to be correct.> The most straightforward fix I came up with is to modify rwrite so > that it doesn't actually try to write the error if the FILE to which > the error has been written is showing a true error condition > (checked with ferror). I will attach a patch.I think a better solution is to create a new function exit_cleanup_nolog, which will prevent exit_cleanup from logging anything. The attached patch implements that, and changes that section of log.c to use it. There are probably other places that should be changed to use it as well.> I suspect that the rsync server running under openssh doesn't have > this problem because openssh sshd causes it it to get a SIGHUP > whereas the ssh.com version of sshd does not.You should be using OpenSSH anyways; the ssh.com version of ssh is proprietary software. Could you test the attached patch? -------------- next part -------------- A non-text attachment was scrubbed... Name: rsync-log.patch Type: text/x-patch Size: 3563 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20020602/5823f446/rsync-log.bin
Jonathan Kamens
2002-Jun-03 03:55 UTC
rsync 2.5.4 (probably 2.5.5 too) server handles SIGPIPE very poorly
> From: Colin Walters <walters@gnu.org> > Date: 02 Jun 2002 23:51:50 -0400 > > > The most straightforward fix I came up with is to modify rwrite so > > that it doesn't actually try to write the error if the FILE to which > > the error has been written is showing a true error condition > > (checked with ferror). I will attach a patch. > > I think a better solution is to create a new function > exit_cleanup_nolog, which will prevent exit_cleanup from logging > anything. The attached patch implements that, and changes that section > of log.c to use it. There are probably other places that should be > changed to use it as well.I don't agree that this is a better solution, especially given your comment in the last sentence above. What do you think is wrong with my solution? It solves the problem exactly where it is likely to be a problem and it transparently solves all instances of the problem, whereas with your solution, everybody calling exit_cleanup forever has to worry about whether they're calling the right one (and you may not have even solved the problem for all the existing code, let alone for new code written later). Why put that burden on the programmer forever when you can just make rwrite handle the problem correctly in all cases? I have been running with the patch I sent ever since I sent it, and it works like a charm.> You should be using OpenSSH anyways; the ssh.com version of ssh is > proprietary software.If I had a choice about this, I would. My employer has chosen to use ssh.com's version rather than OpenSSH. I do not have the influence to convince the relevant decision-makers to switch to OpenSSH. And while it's true that ssh.com's version is in some respects prioprietary, they give away the source code for free and allow it to be used for free on Linux.> Could you test the attached patch?I'd rather not do that until you explain why it is that you think your solution is better than mine. Since rsync has already hosed one of our production machines at work several times as a result of this bug, I'm somewhat reluctant to test something that may still manifest the bug in some cases, especially when the solution I've already got works great. Jonathan Kamens