Hi (especially Wayne), ftp.gwdg.de is rsyncing most of the data from about 500 other rsync servers. Especially during the general "high traffic" phases like the release of a new Knoppix ISO or a new SUSE distribution or a new KDE release, I see timeouts with other servers which have maximum traffic at that time. There is a general scheme: 1. rsync is building the data base of the remote files 2. rsync is building the data base of the local files 3. rsync is reporting to start the necessary actions 4. the connection has timed out. At ftp.gwdg.de, I have set in /etc/rsyncd.conf timeout = 15000 for just obeying the possible bottlenecks of "the other side" in phases of high traffic without riscing lots of unrecognized dead processes, but it seems impossible for me to share this aspect with all the other server maintainers... So, my question (indeed more, a wish): wouldn't it be possible to start bulding both databases in parallel, or shortly after each other? Cheers -e -- Eberhard Moenkeberg (emoenke@gwdg.de, em@kki.org)
On Tue, Sep 07, 2004 at 10:48:31PM +0200, Eberhard Moenkeberg wrote:> So, my question (indeed more, a wish): wouldn't it be possible to start > bulding both databases in parallel, or shortly after each other?That would be possible, yes, but it might be tricky to get it efficient (since you would either need to poll the socket data while scanning the directories or to use a separate thread). This double scan only occurs when the --delete option is used, so one change you can try is to switch over to using --delete-after. This ensures that the transfer completes before the receiving side starts its directory-deletion scanning, making the timeout less disruptive. If this works for you and if you have a hard time getting your users to change their habits, you could even install a custom rsync that would interpret --delete as --delete-after. ..wayne..
On Tue, 7 Sep 2004 20:49:21 +0000 (UTC), Eberhard Moenkeberg <emoenke@gwdg.de> wrote:> So, my question (indeed more, a wish): wouldn't it be possible to start > bulding both databases in parallel, or shortly after each other?I would also vote for such a feature, building the filelist databases in parallel. We are using rsync within dirvish to backup large filesystems (> 1000000 files) and the process of building these filelist alone takes a huge time. Greetings -- Robert Sander Senior Manager Information Systems Epigenomics AG Kleine Praesidentenstr. 1 10178 Berlin, Germany phone:+49-30-24345-330 fax:+49-30-24345-555 http://www.epigenomics.com robert.sander@epigenomics.com What we anticipate seldom occurs; what we least expect generally happens. -- Bengamin Disraeli
On Wed, Sep 08, 2004 at 07:15:23AM +0000, Robert Sander wrote:> I would also vote for such a feature, building the filelist databases > in parallel.The whole scan-first-then-send idiom needs to be replaced with an incremental algorithm (which was the subject of the rZync protocol test code I wrote a while back). When I get around to working on a replacement rsync protocol once again this will finally be taken care of. ..wayne..
On Sep 08, Wayne Davison wrote: | When I get around to working on a | replacement rsync protocol once again this will finally be taken care | of. i also remember reading about Martin Pool talking about a replacement a while back. Realistically, when do you see such an effort happening?