Hello, some time ago I reported a bug, where we saw indeterministic behaviour of rsync (all versions since 2.5), when having the same file appearing in multiple sources. Sometimes the file in the first source was copied, other times the file was copied from one of the other sources. The attached mstest.tgz contains a test to reproduce the behaviour under darwin and solaris. The bug did *not* show up in gnu linux versions of rsync, which will be explained below: rsync uses the "qsort" system call to compose the entire file list from all files of all sources. qsort is known to be unstable, meaning that is does not guarantee the former order, if items to sort have the same value. Our test case triggers a situation where this unstabilibity shows up. Why does it not happen in gnu linux versions? Reading man pages showed us that glibc has an "optimization" in qsort: if memory is not low it uses mergesort instead, which is a stable sort algorithm. fix: Since in our scenario using rsync we rely on deterministic behaviour, we patched rsync to use mergesort always for composing the file list. For systems without a mergesort system call (most os's except freebsd/darwin) we use the freebsd implementation of mergesort and put it in the source tree of rsync. patches (relative to 2.6.2) and source are attached. I want to share this with the public and propose to change rsync to use mergesort instead of qsort. if this is not mainstream since mergesort has worse memory complexity, I propose to give users a command line switch to decide, whether they want to use the feature (prefer reliability for some scenario over performance) or not. Hope this will be heared. Thanks, Dirk. -------------- next part -------------- A non-text attachment was scrubbed... Name: mstest.tgz Type: application/x-gzip Size: 818 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20040825/12c22f05/mstest.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: patches.tgz Type: application/x-gzip Size: 4096 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20040825/12c22f05/patches.bin
Paul Slootman
2004-Aug-25 08:04 UTC
bugfix: indeterministic file choice from multiple sources
On Wed 25 Aug 2004, Dirk Pape wrote:> some time ago I reported a bug, where we saw indeterministic behaviour of > rsync (all versions since 2.5), when having the same file appearing in > multiple sources. Sometimes the file in the first source was copied, other > times the file was copied from one of the other sources.[...]> Since in our scenario using rsync we rely on deterministic behaviour, weWhat I'm wondering is why it's a problem that the same file is "randomly" copied from one of a number of sources, if indeed it is the same file. The resulting destination will be the same, right? Paul Slootman
Wayne Davison
2004-Aug-25 16:04 UTC
bugfix: indeterministic file choice from multiple sources
On Wed, Aug 25, 2004 at 08:44:15AM +0200, Dirk Pape wrote:> Since in our scenario using rsync we rely on deterministic behaviourWhat would you think of a tie-break for identical names that depended on some other attribute of the files? Such as newest file wins? That would be quite easy to add and would be deterministic, but perhaps not in the way you want. ..wayne..
Hello Wayne, --Am Mittwoch, 25. August 2004 9:03 Uhr -0700 schrieb Wayne Davison <wayned@samba.org>:> What would you think of a tie-break for identical names that depended on > some other attribute of the files? Such as newest file wins? That > would be quite easy to add and would be deterministic, but perhaps not > in the way you want.it would definetly break our scenario here. Although it is a nice feature I would not want to be forced to cope with it. Dirk.
Hello, --Am Mittwoch, 25. August 2004 8:44 Uhr +0200 schrieb Dirk Pape <pape@inf.fu-berlin.de>:> Since in our scenario using rsync we rely on deterministic behaviour, we > patched rsync to use mergesort always for composing the file list. For > systems without a mergesort system call (most os's except freebsd/darwin) > we use the freebsd implementation of mergesort and put it in the source > tree of rsync. patches (relative to 2.6.2) and source are attached. > > I want to share this with the public and propose to change rsync to use > mergesort instead of qsort. if this is not mainstream since mergesort has > worse memory complexity, I propose to give users a command line switch to > decide, whether they want to use the feature (prefer reliability for some > scenario over performance) or not. > > Hope this will be heared.will my feature request be considered for one of the next rsync releases? It is easy to be incorporated into the source and if given as a command line switch defauting to old behaviour it will not do any harm. Dirk.