I manage 250+ redhat linux boxes. The boxes are all setup the same way. On a daily basis, we sync the app directory which is about 30gb out to all hosts. The daily delta is actually less than 1gb, but since I can't be sure if any individual box was tempered during the day, I always do a full sync. On a monthly basis, we run with "--delete" to clean out the stale files on the hosts. The command I use daily is: "/usr/bin/rsync -a -e ssh", with a ksh for loop on the 250+ host names The version is: "rsync version 2.5.7 protocol version 26" Since rsync must do a chksum on the local and remote box on all files, the whole sync process takes over 2hrs even if nothing was changed. My questions are: 1) I know I have an old version, are there performance improvements in the later versions? I am not the SA, the process to request a new install is lengthy. 2) Is there a "parallel rsync" program? Looping 250 times to invoke causes rsync to checksum the local files 250 times, which is a waste of resource. Can "parallel rsync" be considered for a future version? 3) Are there better ways to achieve what I need to do with rsync or another tool? Thank you, Clayton - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. -------- IRS Circular 230 Disclosure: Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein.
On Nov 15, 2007 9:08 PM, Tang, Clayton (Yiqi) <yiqi.tang@lehman.com> wrote:> > I manage 250+ redhat linux boxes. The boxes are all setup the same way. > On a daily basis, we sync the app directory which is about 30gb out to > all hosts. The daily delta is actually less than 1gb, but since I can't > be sure if any individual box was tempered during the day, I always do a > full sync. On a monthly basis, we run with "--delete" to clean out the > stale files on the hosts. > > The command I use daily is: "/usr/bin/rsync -a -e ssh", with a ksh for > loop on the 250+ host names > The version is: "rsync version 2.5.7 protocol version 26" > > Since rsync must do a chksum on the local and remote box on all files, > the whole sync process takes over 2hrs even if nothing was changed. > > My questions are: > > 1) I know I have an old version, are there performance improvements in > the later versions? I am not the SA, the process to request a new > install is lengthy. > > 2) Is there a "parallel rsync" program? Looping 250 times to invoke > causes rsync to checksum the local files 250 times, which is a waste of > resource. Can "parallel rsync" be considered for a future version? > > 3) Are there better ways to achieve what I need to do with rsync or > another tool? > > Thank you, > Clayton >Hello Tang, First, for such operation you should RTFM about rsync "batch mode" [1]. Second, If I were you I would look for other solutions. Perhaps a shared NFS storage or a copied FS based on drbd. Using rsync sounds like a quick hack to me when you had 2 servers and 0 time to market. I would love to hear other suggestions people have on this list for your issue. [1] http://samba.anu.edu.au/ftp/rsync/rsync.html -- Cheers, Maxim Veksler "Free as in Freedom" - Do u GNU ?
1) Yes! 2.6.x especially helps with memory. 2) Not that I've seen, but I'd be really interested! 3) We've had great luck with (Open)AFS, though it's not for everyone, not even in our environment. =) (having to load a kernel module being #1 complaint). rsync allows us to accommodate those that don't wish to use AFS. Here's what we're doing: Roughly 10% (~1k hosts) of our install-base use rsync as an alternative to AFS (our system configuration and application store). About 250M is checked hourly, though as often as every 15 minutes for more time sensitive systems. We've tossed around the idea of using batch-mode, but it unfortunately doesn't fit our model - It's basically a huge buffet of data that the hosts pick and choose which trees to keep in sync. What we've found is client initiated pulls scale much better than pushes from a central server. We have each host sleep for a random amount of time using the hostname as a seed (so it's the same from run to run) before initiating the rsync. This causes multiple rsyncs to be run on the server, but it can handle dozens of connections at a time without issue, especially after the switch to 2.6 versions of rsync. We also have multiple servers from which the client can rsync from, but that is handled similarly to the timing: A host randomly picks a server from a list using hostname as the seed. The servers are monitored for load and new ones added appropriately. Our server to client ratio is close to 50:1. -Ducky Tang, Clayton (Yiqi) wrote:> I manage 250+ redhat linux boxes. The boxes are all setup the same way. > On a daily basis, we sync the app directory which is about 30gb out to > all hosts. The daily delta is actually less than 1gb, but since I can't > be sure if any individual box was tempered during the day, I always do a > full sync. On a monthly basis, we run with "--delete" to clean out the > stale files on the hosts. > > The command I use daily is: "/usr/bin/rsync -a -e ssh", with a ksh for > loop on the 250+ host names > The version is: "rsync version 2.5.7 protocol version 26" > > Since rsync must do a chksum on the local and remote box on all files, > the whole sync process takes over 2hrs even if nothing was changed. > > My questions are: > > 1) I know I have an old version, are there performance improvements in > the later versions? I am not the SA, the process to request a new > install is lengthy. > > 2) Is there a "parallel rsync" program? Looping 250 times to invoke > causes rsync to checksum the local files 250 times, which is a waste of > resource. Can "parallel rsync" be considered for a future version? > > 3) Are there better ways to achieve what I need to do with rsync or > another tool? > > Thank you, > Clayton > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. > > -------- > IRS Circular 230 Disclosure: > Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein. > > >
I'm no rsync guru my any means, but two things spring to mind. Use the -t option to stop all the spurious check summing. Split your script into multiple scripts, each with a share of host names. Run each in parallel. Multiple rsyncs can run on the one box concurrently. Craig.... -----Original Message----- From: rsync-bounces+craig=sbisolutions.com.au@lists.samba.org [mailto:rsync-bounces+craig=sbisolutions.com.au@lists.samba.org] On Behalf Of Tang, Clayton (Yiqi) Sent: Friday, 16 November 2007 6:09 AM To: rsync@lists.samba.org Subject: How to make rsync faster? I manage 250+ redhat linux boxes. The boxes are all setup the same way. On a daily basis, we sync the app directory which is about 30gb out to all hosts. The daily delta is actually less than 1gb, but since I can't be sure if any individual box was tempered during the day, I always do a full sync. On a monthly basis, we run with "--delete" to clean out the stale files on the hosts. The command I use daily is: "/usr/bin/rsync -a -e ssh", with a ksh for loop on the 250+ host names The version is: "rsync version 2.5.7 protocol version 26" Since rsync must do a chksum on the local and remote box on all files, the whole sync process takes over 2hrs even if nothing was changed. My questions are: 1) I know I have an old version, are there performance improvements in the later versions? I am not the SA, the process to request a new install is lengthy. 2) Is there a "parallel rsync" program? Looping 250 times to invoke causes rsync to checksum the local files 250 times, which is a waste of resource. Can "parallel rsync" be considered for a future version? 3) Are there better ways to achieve what I need to do with rsync or another tool? Thank you, Clayton - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. -------- IRS Circular 230 Disclosure: Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html