Charles C. Fu 傅建文
2002-May-22 19:04 UTC
rsync: race condition can cause loss of diagnostic output
[This is a copy of the contents of Debian bug report #147842.] Package: rsync Version: 2.5.5-0.2 Severity: normal Cause ----- - rsync forks a child which in turn forks a grandchild in main.c:do_recv(). - Diagnostics written by the grandchild need to be read by the child using read_error_fd() to be handled properly (with the end result being that they are seen by the user running rsync). - select() is used to call read_error_fd() only if there is stuff waiting to be read by it. - The select()/read_error_fd() combo is checked when the child reads and writes. - read_error_fd() only reads a single message from io_error_fd. The end result is that if multiple messages have accumulated, waiting and unread, on io_error_fd at the time the write_int() and io_flush() are called at the end of do_recv(), then only the first is handled by read_error_fd(). The rest get discarded. ---------------------------------------------------------------------- Test Case --------- A manifestation of this race condition is that using the combo of options '-v --delete-after' does not always report all files deleted. Using strace shows that the grandchild correctly writes messages about all deletions, but the child does not necessarily read all these deletion messages. Here is a simple test case. Example: $ rm -rf test $ mkdir -p test/1/2/3 test/a/b/c $ rsync -a --delete-after --delete test/ /tmp/test $ rm -rf test/a $ rsync -av --delete-after --delete test/ /tmp/test building file list ... done ./ deleting directory a/b/c deleting directory a/b wrote 99 bytes read 20 bytes 238.00 bytes/sec total size is 0 speedup is 0.00 Above, the directory /tmp/test/a was also deleted but is not listed. On my system, this is the most common result from running the test case. However, since it is a race condition, sometimes more and sometimes fewer deletions manage to get read by the child and then listed depending upon other load on the system, exact option flags used (e.g., '-vv' instead of just '-v'), .... Note: The test case relies upon --delete-after to show the bug. Without --delete-after, the deletion messages are written before the fork of the grandchild, so the bug does not manifest. ---------------------------------------------------------------------- Fix --- Looking at the code, I would suggest that main.c:do_recv() be modified as follows: - Have the grandchild close error_pipe[1] before its final sleep loop. (Why doesn't the grandchild just exit instead of sleeping? Is it only to try to avoid having to deal with a SIGCHLD in the middle of an I/O operation in the child?) - In the child, prior to the kill(), loop and call read_error_fd() until that descriptor is closed. However, I have not written code to test these suggested modifications nor am I sufficiently well-versed with the rsync code to know whether or not my suggested modifications are the best fix. ---------------------------------------------------------------------- I am not a member of the rsync mailing list, so please email me directly if further details are needed. -- System Information Debian Release: 3.0 Kernel Version: Linux thanatar 2.4.17-686-smp #2 SMP Sat Dec 22 22:00:42 EST 2001 i686 unknown Versions of the packages rsync depends on: ii libc6 2.2.5-4 GNU C Library: Shared libraries and Timezone ii libpopt0 1.6.2-7 lib for parsing cmdline parameters -- Charles C. Fu ,-- Vice President ___ __ __. . ,-/-- ccwf@bacchus.com Bacchus, Inc. (_,(_,|/|/ / http://www.bacchus.com/ Tel/Fax 310-455-2396 ----'
Dick Streefland
2002-May-28 08:16 UTC
rsync: race condition can cause loss of diagnostic output
On Wednesday 2002-05-22 19:00, Charles C. Fu ?????? wrote: | [This is a copy of the contents of Debian bug report #147842.] This looks like the same race condition that I reported to the rsync mailing list on 2002-05-16. The following patch fixes my problem: --- rsync-2.5.5/log.c.orig Mon Feb 18 20:51:12 2002 +++ rsync-2.5.5/log.c Wed May 15 16:12:18 2002 @@ -116,7 +116,12 @@ int n = write(log_error_fd, el->buf+el->written, el->len - el->written); /* don't check for an error if the best way of handling the error is to ignore it */ - if (n == -1) break; + if (n == -1) { + if (errno == EAGAIN) { + msleep(20); + } + break; + } if (n > 0) { el->written += n; } -- Dick Streefland //// Altium Software BV dick.streefland@altium.nl (@ @) http://www.altium.com --------------------------------oOO--(_)--OOo---------------------------
Dick Streefland
2002-May-28 10:10 UTC
rsync: race condition can cause loss of diagnostic output
On Wednesday 2002-05-22 19:00, Charles C. Fu ?????? wrote: | [This is a copy of the contents of Debian bug report #147842.] Sorry, but the patch in my earlier reply was incorrect. An essential continue statement was missing. I also found out that it was not a complete solution to the problem. Here is a revised patch: --- rsync-2.5.5/io.c.orig Fri Mar 22 06:14:44 2002 +++ rsync-2.5.5/io.c Tue May 28 18:49:52 2002 @@ -214,6 +214,7 @@ if (io_error_fd != -1 && FD_ISSET(io_error_fd, &fds)) { read_error_fd(); + continue; } if (!FD_ISSET(fd, &fds)) continue; @@ -435,6 +436,7 @@ if (io_error_fd != -1 && FD_ISSET(io_error_fd, &r_fds)) { read_error_fd(); + continue; } if (FD_ISSET(fd, &w_fds)) { --- rsync-2.5.5/log.c.orig Mon Feb 18 20:51:12 2002 +++ rsync-2.5.5/log.c Tue May 28 18:51:01 2002 @@ -116,7 +116,14 @@ int n = write(log_error_fd, el->buf+el->written, el->len - el->written); /* don't check for an error if the best way of handling the error is to ignore it */ - if (n == -1) break; + if (n == -1) { + if (errno == EAGAIN) + { + msleep(20); + continue; + } + break; + } if (n > 0) { el->written += n; } -- Dick Streefland //// Altium Software BV dick.streefland@altium.nl (@ @) http://www.altium.com --------------------------------oOO--(_)--OOo---------------------------