the rsync in centos 4 (a recompile of rhel4) is version 2.6.3. and under certain circumstances it will segfault when run in daemon mode. i have tracked it down to the nss code in libc. so this could be a general libc bug, but it is possible that rsync is doing things that don't help matters any. the problem manifests when you have chroot mode on (which seems to be the default), have files on the server owned by someone other than root, and request an rsync with -o enabled. this causes the server process to try to call getpwuid for the file uid and things segfault. the underlying issue seems to be that after rsync forks the child process, it reads the system /etc/nsswitch.conf, then later on it chroots and when it tries to getpwuid, libc can't find the libnss files it needs and the code segfaults. has this been seen before? i haven't found anything while google'ing around and looking in the archives. i have found a number of workarounds (disable chroot mode, chown everything to root, don't enable -o). i am wondering if rsync could chroot earlier in the process and maybe avoid this problem? i haven't done any experiments yet with libc to see if things could be helped by that. anything else i can provide to be of help here?
i should have run my test first. this code segfaults: #include <sys/types.h> #include <pwd.h> main() { struct passwd *p; chroot("/tmp"); chdir("/"); p = getpwuid(666); if (p) { printf("%s\n", p->pw_name); } exit(0); } so i guess that rsync can't help in any way. time to start looking at libc...
On Thu, 10 Mar 2005, Joe Pruett <joey@clean.q7.com> wrote:> > the rsync in centos 4 (a recompile of rhel4) is version 2.6.3. and under > certain circumstances it will segfault when run in daemon mode. i have > tracked it down to the nss code in libc. so this could be a general libc > bug, but it is possible that rsync is doing things that don't help matters > any. > > the problem manifests when you have chroot mode on (which seems to be the > default), have files on the server owned by someone other than root, and > request an rsync with -o enabled. this causes the server process to try > to call getpwuid for the file uid and things segfault. ...A similar problem was reported back in February: mail-archive.com/rsync@lists.samba.org/msg12557.html that manifested itself after an upgrade to Fedora Core 2. It had worked fine previously (on FC1, presumably). That user (David Blunkett) provided an strace log that showed this: [pid 27916] open("/usr/lib/libnss_compat.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 27916] stat64("/usr/lib", 0xfef74ac8) = -1 ENOENT (No such file or directory) [pid 27916] --- SIGSEGV (Segmentation fault) @ 0 (0) --- where it's trying to load a dynamically linked library and ends up crapping out at the end. So this is a problem with some run-time dynamically linked library failing to load. But why the delayed load attempt? Aren't these supposed to be resolved when the program is initially loaded at run-time? This has not been a problem in the past, AFAIK. Perhaps some newer library is trying to be too clever for its own good. As a workaround, maybe rsync could do a call to getpwuid() and getgrgid() before doing the chroot to make sure it has the required library loaded? -- John Van Essen Univ of Minn. Alumnus <vanes002@umn.edu>