samba-bugs@samba.org
2005-May-11 19:02 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|hangs indefinitely (while |hangs indefinitely at start |listing files) |of phase 1 ------- Additional Comments From wayned@samba.org 2005-05-11 11:41 ------- Thanks for the extra info. The system-call traces show that the receiver is waiting to read data from the sender, the sender is waiting to read data from the generator, and the generator is waiting for data from the receiver. Combined with your initial report, this would place the code in the generator at the spot where it had just finished sending all checksum data to the sender, sent a -1 (for the end of phase 0), and is now waiting around for redo numbers or the next end-of-phase message from the receiver. So, the question is where did some data get lost? The call to get_redo_num() should have flushed all the buffered output data that the generator had, so it would be good to verify that the last buffer of data (the 3700 bytes in that run) contained the 4 bytes of 0xFF at the end. If so, what is the sender waiting for? Did it not receive the data? Alternately, if the 3700 bytes of data did not have the -1, why didn't all the generator's data flush? A comment in your initial report said that you started out using distro-provided rsync versions -- that makes me wonder what versions of rsync were exhibiting the hang? Is there any chance that the openvpn tunnel is failing to flush? Or losing data? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-May-15 00:21 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 ------- Additional Comments From oopla@users.sf.net 2005-05-14 17:14 ------- The rsync link started with rsync as in Slackware 9.0, then I kept up with distro rels and later with cvs, locally compiling werever needed (one end runs Debian 2.2 - Potato). Yes, I first looked at openvpn, upgraded it to latest stable (2.0), but nothing changed. The puzzling thing was that other services are running fine over the link - http, imap, smtp. At first I didn't question the kernel on endpoint machines... to 'patch' the link keeping rsync I relaied the traffic through my home notebook: ==[A]-----------(I'net:openvpn:UDP broken for rsync)-----------B= \ / --(I'net:vtund:TCP rsync)--[C]--(I'net:vtund:TCP rsync)-- and that was just fine. So endpoints and openvpn seemed ok. But I had a suspect... kernel on endpoints was 2.4.27 - I seem to recall there was a bug (fixed in 2.4.28) in datagram generator (missing counter, something)... which perhaps over long periods screws up something in the kernel - whatever, fact is that relais VPN with vtund over UDP did not work while over TCP did work fine. Then eventually I upgraded kernel on B and rebooted, but rsync still hanged over openvpn:UDP. Finally upgraded/rebooted A as well, and now rsync over that direct openvpn:UDP works flawlessly, as well as the other services. So, my problem seems solved ... so far :]. I'm not 100% sure it was the kernel / that UDP bug to blame, as other services did run ok anyway, but I would not know were else to look. Indeed, endpoints with 2.4.31-pre1 + openvpn:UDP are up since only ~70h by now, while the problem started after nearly 7 months of continuous operation. (rsync sequences are fired once every 15') So it might be too early to say a definitve word, but if you did not spot anything suspicious in rsync code - or conversely, from above you can exclude for sure a glitch in rsync, I'd rather change status to LATER or REMIND (whatever that means, since they're not defined in Bug's Life - I'm just guessing here). -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-May-16 16:41 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |WORKSFORME ------- Additional Comments From wayned@samba.org 2005-05-16 09:33 ------- Since this problem appears to have been caused by a bug in the OS, I'm closing this bug report. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-May-16 18:20 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 ------- Additional Comments From schwardt@sun.ac.za 2005-05-16 11:19 ------- Hi, I have also been bitten by this deadlock bug. Some information on the systems in question: client side (receiving files): rsync 2.6.4 kernel 2.4.25 up-to-date Debian sarge server side (sending files): rsync 2.6.3 kernel 2.4.23 Debian sarge (slightly older) I am rsync'ing over an SSH tunnel from within a Python script. The bug is completely repeatable, appearing each time I ran the script in question. The rsync client command in the script is: rsync -avuz --timeout=300 --rsh="ssh -p 5555 -l root" localhost::aa/*.pt1 . When I add "-vvv", the process hangs at "generate_files phase=1". Some funny stuff I tried: - I created the SSH tunnel from the command line, and typed the above command (except I used a different target directory). It worked. - I reran the Python script (which sets up an identical SSH tunnel and rsyncs based on os.system()). It didn't work. - I could alternate between the above two scenarios ad infinitum. - I edited the script and replaced "*.pt1" with "2005-05-16.pt1" (a file in the same glob that didn't yet exist on the client side). It worked. - When I changed it back to "*.pt1", it didn't work again. - In the script I had another rsync line just below the one given above, with the only difference that it copied "*.wm1" files instead of "*.pt1". I swapped the two lines, and it worked (it copied the *.wm1 files). - When I switched the two lines back, the bug went away (Damn!). Now the "*.pt1" files were also copied. I couldn't yet make the bug reappear again, which is kinda frustrating :-) Hope this helps in some way. Ludwig P.S. I've seen similar reports on the mailing list. Just google for 'rsync hangs "generate_files phase=1"'. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-May-16 19:21 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 ------- Additional Comments From wayned@samba.org 2005-05-16 12:11 ------- The command-line you give contains "localhost", which conflicts with the description of two hosts being involved. I can't help unless the hang is reproducable, so you may wish to try tweaking some of the destination files and see if you can get it to happen again. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-May-16 22:01 UTC
[Bug 2628] hangs indefinitely at start of phase 1
https://bugzilla.samba.org/show_bug.cgi?id=2628 ------- Additional Comments From oopla@users.sf.net 2005-05-16 14:53 ------- You're using 2.6.3 on one side, with -vvv. Perhaps you've been bitten by the very --verbose bug mentioned in BUGFIX in 2.6.4 rel.notes. Therefore, I'd rather upgrade to at least 2.6.4 and see what happens. BTW it's not that strange you don't see that bug anymore, after you succeded rsyncing once somehow - you've moved the balls, so the game is a bit different now... -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.