Anyone got any actual comparisons between unison and rsync specifically related to the performance of synchronization of large data sets over slow links? I have a huge tree to start replication of Friday and know that if I sync the root paths it will take ages and with the lack of any overall state of progress this won't be optimal as its likely to fail for whatever reason before it can finish. Initially I just thought I would break it down to several smaller jobs but that becomes a burden to maintain... We use bacula internally but sending the diffs would be cumbersome as the individual files would be rather large... Thanks! jlc
On Wed, Jan 13, 2010 at 6:54 PM, Joseph L. Casale <jcasale at activenetwerx.com> wrote:> Anyone got any actual comparisons between unison and rsync specifically related > to the performance of synchronization of large data sets over slow links? > > I have a huge tree to start replication of Friday and know that if I sync the root > paths it will take ages and with the lack of any overall state of progress this won't > be optimal as its likely to fail for whatever reason before it can finish. Initially > I just thought I would break it down to several smaller jobs but that becomes a burden > to maintain... > > We use bacula internally ?but sending the diffs would be cumbersome as the individual > files would be rather large... > > Thanks! > jlcUse rsync. It's used far more than unison, so it has been tested better. Unison has always been slow for me. One thing you might want to look at is performing the initial copy, or some chunks of it, using tar over a netcat link, then rsync after that. Since rsync uses SSH, it can be 33% slower than a pure data transfer connection. Using netcat won't get you encryption though, so make sure you're on a local/trusted link.
On 1/13/2010 5:54 PM, Joseph L. Casale wrote:> Anyone got any actual comparisons between unison and rsync specifically related > to the performance of synchronization of large data sets over slow links? > > I have a huge tree to start replication of Friday and know that if I sync the root > paths it will take ages and with the lack of any overall state of progress this won't > be optimal as its likely to fail for whatever reason before it can finish. Initially > I just thought I would break it down to several smaller jobs but that becomes a burden > to maintain... > > We use bacula internally but sending the diffs would be cumbersome as the individual > files would be rather large...I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting. -- Les Mikesell lesmikesell at gmail.com
Joseph L. Casale wrote:> Nate, care to share those packages:)Sure I can post them somewhere tomorrow probably, nothing fancy.. nate
Joseph L. Casale wrote:>> Am I missing something or does it only matter where you have a very high >> bandwidth connection with some latency? > > I would imagine, but I have a server that takes rsync/ssh connections from multiple > windows boxes everyday for differential updates to copies of databases and the > load on that machine is really high.Processes in iowait are counted in the load average. Your real problem may be that rsync copies the unchanged portions of the (probably huge) original file while merging in the changes, then renames to the original name when complete. Do top/sar show the CPU pegged? -- Les Mikesell lesmikesell at gmail.com
nate wrote:> Sure I can post them somewhere tomorrow probably, nothing > fancy..Put them here: http://rpms.linuxpowered.net/hpn-ssh/ All the usual disclaimers apply, I have these running on a few dozen systems at different data centers running file transfers 24/7 for the past year now that I think about it. nate
On Thu, Jan 14, 2010, Joseph L. Casale wrote:>>Another feature of rsync modules that can be useful is that each module can >>specify a user and group thus one can rsync user directories between >>systems where the user names are the same but uid and gid may differ. > >I have been looking at this all morning. Is there any way to auth with keys >or something unique so I can script this securely? Iiuc, the only auth is done >through these rsync user/pass pairs unless you do it with hosts etc.Using rsync in daemon mode with modules requires no authentication if you are comfortable with restricting access to each module by IP address or CIDR block. The rsync man page also says: Some modules on the remote daemon may require authentication. If so, you will receive a password prompt when you connect. You can avoid the password prompt by setting the environment variable RSYNC_PASSWORD to the password you want to use or using the --password-file option. This may be useful when scripting rsync. Bill -- INTERNET: bill at celestial.com Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way Voice: (206) 236-1676 Mercer Island, WA 98040-0820 Fax: (206) 232-9186 Skype: jwccsllc (206) 855-5792 Many companies that have made themselves dependent on [the equipment of a certain major manufacturer] (and in doing so have sold their soul to the devil) will collapse under the sheer weight of the unmastered complexity of their data processing systems. -- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5