We are running rsync 2.4.6 on HPUX-11and using it to push a document-root from a staging area to several servers running rsync in daemon mode. The rsync client syncs successfully to modules on the same server and to most other machines, but hangs when trying to syncronise across a firewall (from SECURE zone to DMZ) (The firewall port IS OPEN!!) During a normal release where there are rougly less than 200 changed files the syncronisation across the firewall runs fine. However it hangs when we try and sync the entire document-root ( or a large subset of it). It appears to wait for several minutes with no new messages in the logs and no packets being exchanged. Eventually it may carry and then hang on another file. After spending 2 days trying to compile 2.5.5 and then finding it didn't work at all ( my memory tells me that it was an error at line 150 of io.c ) we are not keen on moving up to 2.5.5 just yet! I've looked through the mailing list archives and found Wayne Davison's rsync-nohang.patch was suggested to fix similar problems, but this seems to be available for the 2.5.X releases. We are interested in finding out whether the wayne-nohang patches can be applied to 2.4.6. How widely has this patch been implemented, and has anyone found any problems with it? Do people think it is likely to solve our problems? What is the basic idea behind how it solves the problem? Also has anyone found any problems with rsyncing through firewalls where they haven't had the problem for the same files where there is no firewall? Unfortunately as the problems we have are only exhibiting themselves on our production servers we very limited by what diagnostics we are allowed to run and we havent managed to get the same symptoms to exhibit themselves on our (supposedly identical) test machines. I know theres a lot of questions here, but I have spent a good deal of time trying to work out the issue and have hit a bit of a brick wall, so any comments and suggestions would be greatly appreciated. Regards Mark Hyde - Get an SMS alert to your mobile every time you get an email. That's ANY mobile phone. Register for FREE with t-email at www.t-email.co.uk to access your email and contacts via web and WAP -
On Mon, Dec 09, 2002 at 01:49:40PM +0000, rsyncuser wrote:> We are interested in finding out whether the wayne-nohang patches can > be applied to 2.4.6.My older patches for 2.4.6 had got moved aside after they got incorporated into the main distribution. However, I just put them back in their original spot so they can be accessed again. The most important patch was the simplest: http://www.clari.net/~wayne/rsync-nohang1.patch This patch ensures that data coming from the generator to the sender does not overflow and block during the final phase of the transfer on the sending side (but not necessarily at the final file, due to the buffering on the outgoing connection). The current code waited around for the remote process to end without reading the incoming data stream, which was a very bad idea if the -v option was turned on. The second patch fixed a much rarer bug -- one that should only get tickled if a good number of the files fail to transfer correctly on the first try and need to be resent: http://www.clari.net/~wayne/rsync-nohang2.patch An older version of this patch was included in the Red Hat sources for a while, so it was pretty widely tested: http://www.clari.net/~wayne/old/rsync-nohang.patch (Note that this patch contains the "nohang1" patch as well.) The reasoning behind this patch is that there is a data channel from the receiver to the generator that tells it what files to retry. This data channel is left totally unread until all files are handled in pass 1. This means that it can block if enough files need to be resent. My patch keeps this data channel clear by reading it whenever data appears and setting flags on what files to resend during the retry phase. I'm thinking about writing a new patch for the latest rsync that causes these need-to-retry files to be immediately resent by the generator to the sender instead of buffering them (with proper signaling to ensure that retry files get their alternate block-sizes set). Perhaps this solution would finally allow this bug to be put to rest (since it's not yet fixed in the main code). ..wayne..