Hi I use rsync to perform backup on disk on a SunFire 880 with Solaris 8. For performance issues, we launch simultaneously 5 rsyncs on 5 different fliesystems and about 150-200 "cp -p" commands on as many database files. We have been using the same scripts for about 2 months, without problems. The backup is performed on the same server (from filesystem to filesystem on the same server). Last weekend, we replaced the 4X750MHz by 8X1200MHz CPU's and upgraded from 8 to 16 MB of RAM. Since then, we had 2 errors out of 3 backup runs. The error is always on the same filesystem, which is not the largest one but the one that has the more files and directories (400 000 files as opposed to 600 for others). The error message we have is: ------------------- io timeout after 600 seconds - exiting rsync error: timeout in data send/receive (code 30) at io.c(143) rsync: writefd_unbuffered failed to write 69 bytes: phase "unknown": Broken pipe rsync error: error in rsync protocol data stream (code 12) at io.c(836) -------------------- The command used is: OPTS="--delete --timeout=600 --exclude dbf/ --rsh=rsh" /usr/bin/rsync -vRa $OPTS ORIGIN_DIRECT DESTINATION_DIRECT In the documentation, it is said that this kind of error might be related to either: Disk full = This is not the case Remote rsync is not found = It is on the same server, so it is found remote-shell setup isn't working right or isn't "clean" = I tried the suggested testing and there is no problem. Moreover, it worked before... I saw also that the rsync process might have been starving for CPU or memory, In our case I do not think it might be the case. Can you help me on this??? Thanks in advance
On Thu, 2004-06-17 10:11:19 -0400, Anh Truong <transfert.curateur@courriel.gouv.qc.ca> wrote in message <"436 04/06/17*/S=curateur/G=transfert/ORG=courriel/ADMD=gouv.qc/C=ca/"@MHS>:> Last weekend, we replaced the 4X750MHz by 8X1200MHz CPU's and upgraded > from 8 to 16 MB of RAM. Since then, we had 2 errors out of 3 backup runs. The errorThat's quite little RAM for 8 CPUs. SCNR, JBG -- Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak! ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
On Thu, 2004-06-17 10:11:19 -0400, Anh Truong <transfert.curateur@courriel.gouv.qc.ca> wrote in message <"436 04/06/17*/S=curateur/G=transfert/ORG=courriel/ADMD=gouv.qc/C=ca/"@MHS>:> is always on the same filesystem, which is not the largest one but the one that has > the more files and directories (400 000 files as opposed to 600 for others). The > error message we have is: > > ------------------- > io timeout after 600 seconds - exiting > rsync error: timeout in data send/receive (code 30) at io.c(143) > rsync: writefd_unbuffered failed to write 69 bytes: phase "unknown": Broken pipe > rsync error: error in rsync protocol data stream (code 12) at io.c(836) > -------------------- > The command used is: > > OPTS="--delete --timeout=600 --exclude dbf/ --rsh=rsh" > > /usr/bin/rsync -vRa $OPTS ORIGIN_DIRECT DESTINATION_DIRECTrsync has to build *full* file lists (for source and for destination). If this takes longer than your timeout, connection may break. If you put enough pressure on the system, maybe gathering information for some 400000 files (multiply by two for source+destination) could take longer than 10min (= 600sec).> Can you help me on this???I think it's just a performance problem to gather all neded information in 10min. Try setting a higher timeout (like 7200sec = 2h) and have a testdrive. MfG, JBG PS: With 16MB, you'd probably always run into timeout because of constant swapping :) -- Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg fuer einen Freien Staat voll Freier B?rger" | im Internet! | im Irak! ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA)); -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://lists.samba.org/archive/rsync/attachments/20040617/14d55a14/attachment.bin
Sorry, for the RAM, I meant 16 GB instead of 16 MB
Classic. I used to see that. In mine, I finally had to give up, and wrote another tool... not rsync's fault. I would get timeouts during file list builds. As I recall, there's an internally-defined "SELECT_TIMEOUT", that, at least back then, remained at 60 seconds, regardless of the commandline timeout. Now that you've boosted the speed, your big runs are finishing their filelist builds and taking off on hard I/O usage, slowing the filelist build enough to exceed the SELECT_TIMEOUT. Once the list is built, it's more robust. Try running the other rsyncs niced. They'll still burn like crazy, but the increased CPU demand of the big list run during its list build may hold them down enough to let it finish building. Also, start the biglist run first, so its CPU will be busy. If it's the one sharing, it'll be slowed. 10 seconds should suffice, if I'm right. If that doesn't do it, give it a bigger head-start, long enough to start the transfer. The easiest way to determine that timeis to run it once to completion, then run it again on unchanged data, and note the time for that operation. That's much head-start it needs to be in transfer before the big dogs choke it. Wayne'll probably correct my errors. That SELECT_TIMEOUT thing has probably changed by now. It's been a couple of years since I read and mentally traced the whole tree. But, nicing is still a good bet, as is the head-start. Your timeout is already pretty substantial. Tim Conway Unix System Administration Contractor - IBM Global Services desk:3032734776 conway@us.ibm.com Anh Truong <transfert.curateur@courriel.gouv.qc.ca> Sent by: rsync-bounces+conway=us.ibm.com@lists.samba.org 06/17/2004 08:11 AM To rsync@lists.samba.org (Receipt Notification Requested) cc Subject Problem in using rsync Hi I use rsync to perform backup on disk on a SunFire 880 with Solaris 8. For performance issues, we launch simultaneously 5 rsyncs on 5 different fliesystems and about 150-200 "cp -p" commands on as many database files. We have been using the same scripts for about 2 months, without problems. The backup is performed on the same server (from filesystem to filesystem on the same server). Last weekend, we replaced the 4X750MHz by 8X1200MHz CPU's and upgraded from 8 to 16 MB of RAM. Since then, we had 2 errors out of 3 backup runs. The error is always on the same filesystem, which is not the largest one but the one that has the more files and directories (400 000 files as opposed to 600 for others). The error message we have is: ------------------- io timeout after 600 seconds - exiting rsync error: timeout in data send/receive (code 30) at io.c(143) rsync: writefd_unbuffered failed to write 69 bytes: phase "unknown": Broken pipe rsync error: error in rsync protocol data stream (code 12) at io.c(836) -------------------- The command used is: OPTS="--delete --timeout=600 --exclude dbf/ --rsh=rsh" /usr/bin/rsync -vRa $OPTS ORIGIN_DIRECT DESTINATION_DIRECT In the documentation, it is said that this kind of error might be related to either: Disk full = This is not the case Remote rsync is not found = It is on the same server, so it is found remote-shell setup isn't working right or isn't "clean" = I tried the suggested testing and there is no problem. Moreover, it worked before... I saw also that the rsync process might have been starving for CPU or memory, In our case I do not think it might be the case. Can you help me on this??? Thanks in advance -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
On Thu, Jun 17, 2004 at 10:11:19AM -0400, Anh Truong wrote:> Hi > > I use rsync to perform backup on disk on a SunFire 880 with Solaris 8. For > performance issues, we launch simultaneously 5 rsyncs on 5 different fliesystems > and about 150-200 "cp -p" commands on as many database files. We have been > using the same scripts for about 2 months, without problems. The backup is > performed on the same server (from filesystem to filesystem on the same server).Try the rsyncs serially. If you have no problems then I'd suspect memory pressure is pushing things to swap and causing the timeout. -chris