OK, I searched through the archives and didn't find anything so I must be doing something stupid. I just compiled 2.6.3 and am trying to get it to run on linux 2.6.8.1 kernel with a more or less fedora core2 environment. I was trying between two systems but I have narrowed it down to what I believe is a pretty simple case. Here is my setup/test: Last login: Mon Oct 4 19:30:49 2004 [root@Node30 root]# cat /etc/xinetd.d/rsync # default: off # description: The rsync server is a good addition to an ftp server, as it \ # allows crc checksumming etc. service rsync { disable = no socket_type = stream wait = no user = root server = /usr/bin/rsync server_args = --daemon log_on_failure += USERID } [root@Node30 root]# cat /etc/rsyncd.conf use chroot = true [rsync] comment = backup path = /base read only = no list = yes uid = root gid = root [root@Node30 root]# mkdir /base/test1 [root@Node30 root]# touch /base/test1/testfile.txt [root@Node30 root]# mkdir /base/bkup [root@Node30 root]# rsync -av localhost::rsync/test1 /base/bkup/ receiving file list ... done test1/ test1/testfile.txt sent 93 bytes received 132 bytes 450.00 bytes/sec total size is 0 speedup is 0.00 [root@Node30 root]# chgrp -R nobody /base/test1 [root@Node30 root]# rsync -av localhost::rsync/test1 /base/bkup/ receiving file list ... rsync: connection unexpectedly closed (4 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(359) [root@Node30 root]# chgrp -R root /base/test1 [root@Node30 root]# rsync -av localhost::rsync/test1 /base/bkup/ receiving file list ... done sent 73 bytes received 92 bytes 330.00 bytes/sec total size is 0 speedup is 0.00 [root@Node30 root]# Can anyone point me in the right direction? My end goal is to sync files between two machines Thanks, Joe Oh, Please copy me on any replies as I am not on the mailing list.
On Mon, Oct 04, 2004 at 09:06:01PM -0500, Joe Eiler wrote:> I just compiled 2.6.3 and am trying to get it to run on linux 2.6.8.1 > kernel with a more or less fedora core2 environment.I've tested 2.6.3 under a debian sarge setup using a linux 2.6.8.1 kernel. The only changes I made in your test case were to use inetd instead of xinetd, and to use "nogroup" instead of "nobody" for the group (which is a known difference between debian and fedora). I couldn't get rsync to fail. So, you should first check for messages from rsyncd in your syslog to see if rsync is exiting with an error or dying. You migh also try running the rsync daemon without using xinetd (just run rsync --daemon on its own with the xinetd setup turned off). If rsync is crashing, you should enable core dumps (either "rsync --daemon" from a shell where you have typed "ulimit -c unlimted", or put that command into a script that runs "rsync --daemon" and run that script from xinetd). ..wayne..
Thanks for the response Wayne. Here are the results of my tests based on your comments, I added some extra carriage returns to hopefully make it easier to read. The logs are at the bottom. A 'ps -ef' shows rsync --daemon is still running. Joe [root@Node30 root]# [root@Node30 root]# /etc/init.d/xinetd status xinetd is stopped [root@Node30 root]# [root@Node30 root]# ulimit -c unlimted -bash: ulimit: unlimted: invalid number [root@Node30 root]# [root@Node30 root]# ulimit -c unlimited [root@Node30 root]# [root@Node30 root]# rsync --daemon [root@Node30 root]# [root@Node30 root]# ls -l /base/test1/testfile.txt -rw-r--r-- 1 root nobody 0 Oct 4 19:59 /base/test1/testfile.txt [root@Node30 root]# [root@Node30 root]# ls -l /base/bkup total 0 drwxr-xr-x 2 root root 80 Oct 4 19:59 test1 [root@Node30 root]# [root@Node30 root]# rsync -av localhost::rsync/test1 /base/bkup/ receiving file list ... rsync: connection unexpectedly closed (4 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(359) [root@Node30 root]# [root@Node30 root]# chgrp -R root /base/test1 [root@Node30 root]# [root@Node30 root]# rsync -av localhost::rsync/test1 /base/bkup/ receiving file list ... done sent 73 bytes received 92 bytes 330.00 bytes/sec total size is 0 speedup is 0.00 [root@Node30 root]# [root@Node30 root]# chgrp -R nobody /base/test1 [root@Node30 root]# [root@Node30 root]# rsync -avvv localhost::rsync/test1 /base/bkup/ opening tcp connection to localhost port 873 receiving file list ... rsync: connection unexpectedly closed (4 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(359) _exit_cleanup(code=12, file=io.c, line=359): about to call exit(12) [root@Node30 root]# [root@Node30 root]# rsync -avvvvv localhost::rsync/test1 /base/bkup/ opening tcp connection to localhost port 873 (Client) Protocol versions: remote=28, negotiated=28 receiving file list ... rsync: connection unexpectedly closed (4 bytes received so far) [receiver] _exit_cleanup(code=12, file=io.c, line=359): entered rsync error: error in rsync protocol data stream (code 12) at io.c(359) _exit_cleanup(code=12, file=io.c, line=359): about to call exit(12) [root@Node30 root]# [root@Node30 root]# chgrp -R root /base/test1 [root@Node30 root]# [root@Node30 root]# rsync -avvvvv localhost::rsync/test1 /base/bkup/ opening tcp connection to localhost port 873 (Client) Protocol versions: remote=28, negotiated=28 receiving file list ... recv_file_name(test1) recv_file_name(test1/testfile.txt) received 2 names done gid 0() maps to 0 [receiver] i=0 <NULL> <NULL> test1 mode=040755 len=80 uid=0 gid=0 [receiver] i=1 <NULL> test1 testfile.txt mode=0100644 len=0 uid=0 gid=0 recv_file_list done get_local_name count=2 /base/bkup/ recv_files(2) starting generator starting pid=23374 count=2 delta transmission enabled recv_generator(test1,0) recv_generator(test1/testfile.txt,1) test1/testfile.txt is uptodate generate_files phase=1 recv_files phase=1 generate_files phase=2 recv_generator(test1,0) generate_files finished recv_files finished sent 77 bytes received 92 bytes 338.00 bytes/sec total size is 0 speedup is 0.00 _exit_cleanup(code=0, file=main.c, line=1146): entered _exit_cleanup(code=0, file=main.c, line=1146): about to call exit(0) [root@Node30 root]# [root@Node30 root]# And here are the logs after doing these tests [root@Node30 root]# tail -15 /var/log/messages Oct 4 20:00:59 localhost rsyncd[7840]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 4 20:03:29 localhost rsyncd[8297]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 4 20:03:29 localhost rsyncd[8297]: wrote 92 bytes read 73 bytes total size 0 Oct 4 20:04:37 localhost rsyncd[8529]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 04:02:00 localhost logrotate: ALERT exited abnormally with [1] Oct 5 06:06:12 localhost rsyncd[22781]: rsyncd version 2.6.3 starting, listening on port 873 Oct 5 06:06:12 localhost rsyncd[22781]: unable to bind any inbound sockets on port 873 Oct 5 06:06:12 localhost rsyncd[22781]: rsync error: error in socket IO (code 10) at socket.c(448) Oct 5 06:07:26 localhost rsyncd[23000]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 06:07:59 localhost rsyncd[23140]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 06:07:59 localhost rsyncd[23140]: wrote 92 bytes read 73 bytes total size 0 Oct 5 06:08:30 localhost rsyncd[23264]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 06:08:41 localhost rsyncd[23281]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 06:09:14 localhost rsyncd[23375]: rsync on rsync/test1 from localhost.localdomain (127.0.0.1) Oct 5 06:09:14 localhost rsyncd[23375]: wrote 92 bytes read 77 bytes total size 0 [root@Node30 root]# [root@Node30 root]# tail /var/log/secure Oct 4 17:36:34 localhost xinetd[904]: START: rsync pid=13146 from=127.0.0.1 Oct 4 19:02:41 localhost sshd[29054]: Accepted password for root from 172.30.10.200 port 35235 ssh2 Oct 4 19:03:33 localhost xinetd[904]: START: rsync pid=29283 from=127.0.0.1 Oct 4 19:19:41 localhost xinetd[904]: START: rsync pid=32300 from=127.0.0.1 Oct 4 19:20:16 localhost xinetd[32433]: START: rsync pid=32483 from=127.0.0.1 Oct 4 19:21:50 localhost xinetd[32757]: START: rsync pid=32761 from=127.0.0.1 Oct 4 19:22:08 localhost xinetd[32757]: START: rsync pid=367 from=127.0.0.1 Oct 4 19:22:40 localhost xinetd[32757]: START: rsync pid=464 from=127.0.0.1 Oct 4 19:22:48 localhost xinetd[32757]: START: rsync pid=481 from=127.0.0.1 Oct 4 19:56:54 localhost sshd[7047]: Accepted password for root from 172.30.10.200 port 35310 ssh2 [root@Node30 root]# -----Original Message----- From: Wayne Davison [mailto:wayned@samba.org] Sent: Tuesday, October 05, 2004 12:37 AM To: Joe Eiler Cc: 'rsync@lists.samba.org' Subject: Re: group change causing failure On Mon, Oct 04, 2004 at 09:06:01PM -0500, Joe Eiler wrote:> I just compiled 2.6.3 and am trying to get it to run on linux 2.6.8.1 > kernel with a more or less fedora core2 environment.I've tested 2.6.3 under a debian sarge setup using a linux 2.6.8.1 kernel. The only changes I made in your test case were to use inetd instead of xinetd, and to use "nogroup" instead of "nobody" for the group (which is a known difference between debian and fedora). I couldn't get rsync to fail. So, you should first check for messages from rsyncd in your syslog to see if rsync is exiting with an error or dying. You migh also try running the rsync daemon without using xinetd (just run rsync --daemon on its own with the xinetd setup turned off). If rsync is crashing, you should enable core dumps (either "rsync --daemon" from a shell where you have typed "ulimit -c unlimted", or put that command into a script that runs "rsync --daemon" and run that script from xinetd). ..wayne..
OK, I did some more poking around and it is actually the read command that is failing(well, returning EOF) I put a couple print statements in and everything is identical until the failure. Can anyone give me a hint of how to go about debugging this further? Snip from Failing case (group as nobody) (Client) Protocol versions: remote=28, negotiated=28 io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] receiving file list ... flist.c:1284 f[3] io.c:775 f[3] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] io.c:502 fd[3] len[24] io.c:504 fd[3] ret[0] n[24] io.c:502 fd[5] len[4] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] io.c:502 fd[3] len[29] io.c:504 fd[3] ret[0] n[29] io.c:504 fd[5] ret[0] n[4] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[0] _exit_cleanup(code=12, file=io.c, line=359): entered _exit_cleanup(code=12, file=io.c, line=359): about to call exit(12) Snip from good case (group as root) (Client) Protocol versions: remote=28, negotiated=28 io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] receiving file list ... flist.c:1284 f[3] io.c:775 f[3] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] io.c:502 fd[3] len[24] io.c:504 fd[3] ret[0] n[24] io.c:502 fd[5] len[4] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] io.c:502 fd[3] len[29] io.c:504 fd[3] ret[0] n[29] io.c:504 fd[5] ret[0] n[4] io.c:502 fd[3] len[4] io.c:504 fd[3] ret[0] n[4] io.c:502 fd[3] len[56] io.c:504 fd[3] ret[0] n[56] io.c:777 f[3] c[^Y] io.c:775 f[3] io.c:777 f[3] c[^E] recv_file_name(test1) Joe -----Original Message----- From: Wayne Davison [mailto:wayned@samba.org] Sent: Tuesday, October 05, 2004 12:37 AM To: Joe Eiler Cc: 'rsync@lists.samba.org' Subject: Re: group change causing failure On Mon, Oct 04, 2004 at 09:06:01PM -0500, Joe Eiler wrote:> I just compiled 2.6.3 and am trying to get it to run on linux 2.6.8.1 > kernel with a more or less fedora core2 environment.I've tested 2.6.3 under a debian sarge setup using a linux 2.6.8.1 kernel. The only changes I made in your test case were to use inetd instead of xinetd, and to use "nogroup" instead of "nobody" for the group (which is a known difference between debian and fedora). I couldn't get rsync to fail. So, you should first check for messages from rsyncd in your syslog to see if rsync is exiting with an error or dying. You migh also try running the rsync daemon without using xinetd (just run rsync --daemon on its own with the xinetd setup turned off). If rsync is crashing, you should enable core dumps (either "rsync --daemon" from a shell where you have typed "ulimit -c unlimted", or put that command into a script that runs "rsync --daemon" and run that script from xinetd). ..wayne..
I believe that first one is because I already had a daemon running. Since I saw that I have been doing a ps -ef | grep rsync to make sure they are all dead before testing. I have been doing a "rsync -av localhost::" to see if a daemon is running, is this a valid test? I did do a "strace -f rsync --daemon --no-detach 2>/tmp/rsync-$$.out" but it is about 55K and it is SEGFAULTing and dumping core(but core wasn't where I expected it to be. The core file is about 680k, I am going to attempt to crack the core file open, would anyone else like to take a look(looking at Wayne with very sad eyes) joe -----Original Message----- From: Wayne Davison [mailto:wayned@samba.org] Sent: Tuesday, October 05, 2004 11:40 AM To: Joe Eiler Cc: rsync@lists.samba.org Subject: Re: group change causing failure On Tue, Oct 05, 2004 at 07:29:03AM -0500, Joe Eiler wrote:> Oct 5 06:06:12 localhost rsyncd[22781]: rsyncd version 2.6.3 starting,listening on port 873> Oct 5 06:06:12 localhost rsyncd[22781]: unable to bind any inboundsockets on port 873> Oct 5 06:06:12 localhost rsyncd[22781]: rsync error: error in socket IO(code 10) at socket.c(448) These mean that the "rsync --daemon" command did not start, so something else must be accepting daemon connections on your system. You'll need to diagnose that if you want to get a core-dump from the daemon (which must be run from a shell where ulimit allows core-dumps). Your output commands in the client won't tell you anything because it is already reporting that the server goes away, so the important thing is what is happening to the server. If you want to put some print commands into server code, you can run "rsync --daemon --nodetach" to see them (just be sure to output them on stderr). ..wayne..
First, Thanks for all your help Wayne. This doesn't appear to be an rsync problem at all. OK here is what I am seeing. Here is the backtrace (gdb) bt #0 0x4000734c in open_path () from /lib/ld-linux.so.2 #1 0x40004eed in _dl_map_object () from /lib/ld-linux.so.2 #2 0x4011b114 in dl_open_worker () from /lib/tls/libc.so.6 #3 0x4000bb46 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0x4011af37 in _dl_open () from /lib/tls/libc.so.6 #5 0x4011cf93 in do_dlopen () from /lib/tls/libc.so.6 #6 0x4000bb46 in _dl_catch_error () from /lib/ld-linux.so.2 #7 0x4011ce02 in __libc_dlopen_mode () from /lib/tls/libc.so.6 #8 0x400faa55 in __nss_lookup_function () from /lib/tls/libc.so.6 #9 0x400fb3a3 in __nss_next () from /lib/tls/libc.so.6 #10 0x400b341f in getgrgid_r@@GLIBC_2.1.2 () from /lib/tls/libc.so.6 #11 0x400b2c81 in getgrgid () from /lib/tls/libc.so.6 #12 0x0805d449 in add_gid (gid=99) at uidlist.c:75 #13 0x08058ab7 in send_file_entry (file=0x4019afd0, f=5, base_flags=1) at flist.c:463 #14 0x080590df in send_file_name (f=5, flist=0x808ad80, fname=0xbfffa381 "test1", recursive=1, base_flags=1) at flist.c:976 #15 0x0805971e in send_file_list (f=5, argc=0, argv=0x80889b0) at flist.c:1214 #16 0x0805137c in start_server (f_in=5, f_out=5, argc=2, argv=0x80889a8) at main.c:440 #17 0x08062635 in start_daemon (f_in=5, f_out=5) at clientserver.c:479 #18 0x0805de03 in start_accept_loop (port=873, fn=0x8061b10 <start_daemon>) at socket.c:512 #19 0x08062811 in daemon_main () at clientserver.c:625 #20 0x08051c0a in main (argc=0, argv=0x0) at main.c:1124 (gdb) The difference between the root group and any other group is in uidlist.c /* Add a gid to the list of gids. Only called on sending side. */ void add_gid(gid_t gid) { struct idlist *list; if (gid == 0) /* don't map root */ return; for (list = gidlist; list; list = list->next) { if (list->id == (int)gid) return; } add_to_list(&gidlist, (int)gid, gid_to_name(gid), 0); } so it looks like gid_to_name(well actually getgrgid) is segfaulting. The gid(99) is right. so I guess it is back to the drawing board. Thanks again, Joe -----Original Message----- From: Wayne Davison [mailto:wayned@samba.org] Sent: Tuesday, October 05, 2004 2:26 PM To: Joe Eiler Cc: rsync@lists.samba.org Subject: Re: group change causing failure On Tue, Oct 05, 2004 at 01:02:57PM -0500, Joe Eiler wrote:> I have been doing a "rsync -av localhost::" to see if a daemon is > running, is this a valid test?Yes, that should return a connection-refused error if rsync isn't running.> The core file is about 680k, I am going to attempt to crack the core > file open, would anyone else like to take a look (looking at Wayne > with very sad eyes)To ensure that the core file is useful, make sure that your rsync is compiled with debug symbols (i.e. run "gdb rsync" and ensure that "l main" shows you the source). The default Makefile includes debug symbols in the rsync it generates, so as long as they didn't get stripped out, you should be OK. Then, you should be able to run "gdb rsync core" and get a backtrace ("bt") of where it died and be able to look around at variables. If you want me to look at it, I'll need both the (debug-enabled) executable and the core file -- either put them on a web site somewhere and email me where they are located, or gzip the files and attach them to an email to me (not the mailing list, though!). ..wayne..
Just wanted to post a quick follow up. The problem was a bug in glibc (introduced be a vendor patch) so the solution was to upgrade glibc to a newer release. Wayne, thanks for all your help and all the great work you guys do. Joe -----Original Message----- From: Wayne Davison [mailto:wayned@samba.org] Sent: Tuesday, October 05, 2004 11:40 AM To: Joe Eiler Cc: rsync@lists.samba.org Subject: Re: group change causing failure On Tue, Oct 05, 2004 at 07:29:03AM -0500, Joe Eiler wrote:> Oct 5 06:06:12 localhost rsyncd[22781]: rsyncd version 2.6.3 starting,listening on port 873> Oct 5 06:06:12 localhost rsyncd[22781]: unable to bind any inboundsockets on port 873> Oct 5 06:06:12 localhost rsyncd[22781]: rsync error: error in socket IO(code 10) at socket.c(448) These mean that the "rsync --daemon" command did not start, so something else must be accepting daemon connections on your system. You'll need to diagnose that if you want to get a core-dump from the daemon (which must be run from a shell where ulimit allows core-dumps). Your output commands in the client won't tell you anything because it is already reporting that the server goes away, so the important thing is what is happening to the server. If you want to put some print commands into server code, you can run "rsync --daemon --nodetach" to see them (just be sure to output them on stderr). ..wayne..