On a typical embedded Linux device, with no MMU, there is no fork() or it returns ENOSYS. The nearest replacements are vfork() (which is only useful before exec*()), or to create threads with pthread_create(). rsync would be a very useful program on such devices, and I was a bit disappointed to build it, only to find the compile went fine but it failed at runtime due to ENOSYS. Is there any likelihood of changing the rsync code to use threads instead of processes? Would it be a lot of work? I notice it has been attempted before, but the attempt seems to have been abandoned. Thanks, -- Jamie
On Thu, Nov 24, 2005 at 03:22:23AM +0000, Jamie Lokier wrote:> Is there any likelihood of changing the rsync code to use threads > instead of processes?It would be a fairly large task, so it's not one that I'm planning to tackle, at least not directly. Sometime in the future it would be nice to implement an improved rsync protocol that did not require either a forked duplicate or an extra thread (I implemented one such a protocol in a testbed program a while back), but I don't know when I'll finally get around to that. ..wayne..
On Thu, Nov 24, 2005 at 03:22:23AM +0000, Jamie Lokier wrote:> Is there any likelihood of changing the rsync code to use threads > instead of processes?I just thought about this a bit more, and it didn''t seem as large a task as I had originally assumed it would be. I also remembered that I''m going to need something like this (*) to effect my desired changes of decreasing memory use and improving speed, so I''ve coded up a first-cut version: http://opencoder.net/threaded-receiver.diff This applies to the very latest CVS source (which is also available in a just-updated "nightly" tar file). The code does not attempt to discern what kind of threading is available, so that will probably need to be improved for systems that don''t use pthreads.h. If anyone tries this out, please let me know how it goes and any problems you run into. ..wayne.. * Pertaining to upcoming improvements to rsync: I want to be able to dynamically change the file list as rsync runs (both to save memory and to allow for a more incremental transfer protocol). To do that, I need the receiving side to have a single, shared file list. In the long run it might be better to change the receiver into a state-machine that is triggered by the arrival of socket data, but in the near term it is much easier to change it to use a separate thread of execution because this means that I don''t need to rework the procedural logic into some event- triggered logic. Most of the work of separating global variables that need to differ between the receiver and the generator will carry over to a state-machine version too, if we decide to switch over in the future.
Nelson H. F. Beebe
2005-Dec-09 00:43 UTC
Any change of rsync using threads instead of fork?
List traffic today asks about changing rsync to use lightweight threads instead of heavyweight fork. Before rushing into building a threads version of rsync, please READ this recent article (its author is also the co-author of the well-known, and widely used, gc (garbage collecting version of C malloc and C++ new) library: see http://www.hpl.hp.com/personal/Hans_Boehm/gc/index.html ): @String{j-SIGPLAN = "ACM SIG{\-}PLAN Notices"} @Article{Boehm:2005:TCI, author = "Hans-J. Boehm", title = "Threads cannot be implemented as a library", journal = j-SIGPLAN, volume = "40", number = "6", pages = "261--268", month = jun, year = "2005", CODEN = "SINODQ", DOI = "http://doi.acm.org/10.1145/1065010.1065042", ISSN = "0362-1340", bibdate = "Tue Jun 21 17:04:05 MDT 2005", bibsource = "http://portal.acm.org/", acknowledgement = ack-nhfb, abstract = "In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.", remark = "This is an important paper: it shows that current languages cannot be reliable for threaded programming without language changes that prevent compiler optimizations from foiling synchronization methods and memory barriers. The article''s author and others are collaborating on a proposal for changes to the C++ language to remedy this, but that still leaves threads unreliable in C code, even with POSIX threads.", } The gist of the paper is that your threaded program may appear to work, and even work apparently correctly most of the time. However, there are going to be nasty race conditions that from time to time cause unpredictable, and unreproducible, behavior. Most of us would rather not have to deal with that mess in production software (e.g., at my site, rsync is a critical component in nightly updates of about 100 systems running about 20 different flavors of Unix serving thousands of users). ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - -------------------------------------------------------------------------------
Nelson H. F. Beebe wrote:> List traffic today asks about changing rsync to use lightweight > threads instead of heavyweight fork. > > Before rushing into building a threads version of rsync, please READ > this recent articleYou didn''t post a link directly to the article, just to the gateway page. The ACM requires a fee to read the article (or a pre-paid account). I considered paying the fee, if it''s not much, but it requires user registration before it will even say what the fee is; I don''t purchase from sites which won''t state the price up front.>From the abstract:> We provide specific arguments that a pure library approach, in which > the compiler is designed independently of threading issues, cannot > guarantee correctness of the resulting code.That''s correct. Any implementor of a sound thread library, and of compilers used with such libraries, knows that there are memory aliasing optimisations which defeat essential synchronisation barriers in functions like `pthread_mutex_lock''. That is a cause of unsafe code. This is not new knowledge, but perhaps needed a paper all the same. _Real_ systems which implement the POSIX threads specification (pthreads) are not implemented in that way. They all require, in some way, something special of their compiler. Even if that''s just a guarantee that calling (at least some marked) external functions may read and write all program data which can be reached from multiple execution paths. Programs which call longjmp() and setcontext() have similar issues. Therefore unix compilers, and their optimisations, have to take into account those sorts of things. If a system claims it supports "POSIX threads" (and if it really does), then you can rely on this. That doesn''t mean there aren''t implementations with bugs, but the ones which really do conform (and generally vendors do put the effort in), are quite safe to use. A few auxiliary points: 1. The request is not to use "lightweight" threads for no reason, or some vagueness of efficiency. It is because some C environments _cannot do_ fork. The only way to implement rsync on those is either using threads or state machines. 2. A thread-safe version of rsync does not mean that you have to use threads on all platforms, only that it''s an option. Indeed, if rsync is to remain portable, it must continue to be compilable without threads too. Maybe it should default to using fork, except when fork is not available, and when testing. 4. I have been involved in the design of pthreads libraries on GNU/Linux. I can assure you the various synchronisation primitives do what they say they do, at least on a distro where the appropriate C compiler is used with the appropriate library, and that it is possible to build correct, safe code on real platforms using those primitives.> (its author is also the co-author of the > well-known, and widely used, gc (garbage collecting version of C > malloc and C++ new) libraryYes, it is widely used. Did you know Boehm''s gc library is compiler-unsafe to a far greater degree than most thread libraries? Still, it is widely used anyway :) -- Jamie
Wayne Davison wrote:> On Thu, Nov 24, 2005 at 03:22:23AM +0000, Jamie Lokier wrote: > > Is there any likelihood of changing the rsync code to use threads > > instead of processes? > > I just thought about this a bit more, and it didn''t seem as large a task > as I had originally assumed it would be. I also remembered that I''m > going to need something like this (*) to effect my desired changes of > decreasing memory use and improving speed, so I''ve coded up a first-cut > version: > > http://opencoder.net/threaded-receiver.diff > > This applies to the very latest CVS source (which is also available in a > just-updated "nightly" tar file). The code does not attempt to discern > what kind of threading is available, so that will probably need to be > improved for systems that don''t use pthreads.h. If anyone tries this > out, please let me know how it goes and any problems you run into.Thank you! I don''t know if I''ll try it now, as I''m working on an alternate solution to our project which I need to be confident will work, but if I do I''ll let you know how it went. -- Jamie
On Fri, Dec 09, 2005 at 07:30:13PM +0000, Jamie Lokier wrote:> Indeed, if rsync is to remain portable, it must continue to be > compilable without threads too.Just for the sake of completeness, I''ll also add one more option: the GNU pth library. I configured it with both --enable-syscall-soft and --enable-pthread, and it worked fine with my latest threaded test version. I''m still thinking that the future of rsync will probably be to change the receiver over to using a state-machine that gets triggered based on incoming socket data, but in the meantime this threading code will allow me to run some experimental protocol tests and (if tested enough to ensure that it has no strange side-effects) should also allow some systems that don''t support a forked-receiver process to use the rsync codebase plus the patch. ..wayne..
Nelson H. F. Beebe dixit:>List traffic today asks about changing rsync to use lightweight >threads instead of heavyweight fork.It''s not lightweight on all operating systems.>Before rushing into building a threads version of rsyncIf at all, make it a compile-time option to add thread support, and a run-time option to disable it even if compiled in.>cause unpredictable, and unreproducible, behavior. Most of us would >rather not have to deal with that mess in production software (e.g., >at my site, rsync is a critical component in nightly updates of about >100 systems running about 20 different flavors of Unix serving >thousands of users).Even whole operating system source repositories are distributed with it. And threads are a mess (except on Win32). bye, //mirabile -- I believe no one can invent an algorithm. One just happens to hit upon it when God enlightens him. Or only God invents algorithms, we merely copy them. If you don''t believe in God, just consider God as Nature if you won''t deny existence. -- Coywolf Qi Hunt
So far I am not having luck with the threads version: rsync: mkstemp "/mnt/storage/bin/" failed: Success (0) ./rsync: io.c: 334: push_redo_num: Assertion `am_receiver()'' failed. is typical. Or SIGSEGV. There is something very fishy going on, and I suspect it isn''t the rsync code, but something in uclinux/uclibc, but it''s a right pain to debug. (In case people just joined: I requested the threads version because fork() isn''t available at all on uclinux). -- Jamie
On Sat, Dec 17, 2005 at 11:22:58PM +0000, Jamie Lokier wrote:> rsync: mkstemp "/mnt/storage/bin/" failed: Success (0)That makes me wonder if the thread handling is not properly giving each thread its own errno.> ./rsync: io.c: 334: push_redo_num: Assertion `am_receiver()'' failed.That should only occur (that I know of) if pthread_self() isn''t returning the right value. Perhaps your file-I/O routines are not thread-safe? I''d suggest downloading GNU pth and trying out rsync using that. Be sure to configure *pth* with "--enable-syscall-soft --enable-pthread", build it, but there''s no need to install it; if you tweak rsync''s one include of pthread.h (that should be in rsync.h in the latest patch) to include pth''s pthread.h file, and then tweak rsync''s Makefile to link against the pth/.libs/libpthread.a library (instead of using -lpthread), that should get you going using the pth library''s threading support. The nice thing about doing this is that none of the system''s libraries need to be thread-safe for this scenario because the software does its own context switching when various read/write/select function calls happen. ..wayne..
Wayne Davison wrote:> On Sat, Dec 17, 2005 at 11:22:58PM +0000, Jamie Lokier wrote: > > rsync: mkstemp "/mnt/storage/bin/" failed: Success (0) > > That makes me wonder if the thread handling is not properly giving > each thread its own errno.I agree, that seems likely. Unfortunately I don''t have great debugging tools, and figuring out why is proving tricky.> > ./rsync: io.c: 334: push_redo_num: Assertion `am_receiver()'' failed. > > That should only occur (that I know of) if pthread_self() isn''t > returning the right value. > > Perhaps your file-I/O routines are not thread-safe? I''d suggest > downloading GNU pth and trying out rsync using that. Be sure to > configure *pth* with "--enable-syscall-soft --enable-pthread", build it,That''s an excellent suggestion, and I''ve just tried it. It''s necessary to include Pth''s <pthread.h> _before_ uclibc system header files, like <sys/types.h>, because the latter defines types like pthread_t unconditionally (even when not linking with "real" threads). There are lots of redefinition warnings and errors. So I put "-include pthread.h" in the Makefile, to include Pth''s <pthread.h> before anything. That makes it compile fine. Running: it''s fine for connecting to a server and listing. Fetching files works better than before, with a couple of files arriving. However, then I see Illegal Instruction. More to debug. (It does not help that I have an older cross-compiler, which is more "official" for the kit I''m using, and which might work with these libraries better, but which crashes when compiling the large functions in generator.c and receiver.c.) There is another quirk, which might apply to all Linux systems: libc contains utimes(), so HAVE_UTIMES is defined. But it returns ENOSYS, so rsync complains and fails to set times. This is because the kernel (ARM linux 2.4.26) does not have utimes(). In fact, utimes() was added to approximately Linux 2.6.0 on many architectures (it varies). Undefining HAVE_UTIMES fixes this. It might be better for rsync to fall back to utime() when utimes() returns ENOSYS. If you do decide to depend on ENOSYS in a portable way, consider setting SIGSYS to SIG_IGN too. Thanks for your time! -- Jamie