We have a particular file system that we're trying to keep in sync between two FreeBSD/ZFS servers using Rsync. The file system has many millions of files, and about 4TB of data total. Rsync takes HOURS to run, even when there are no files to transfer. Just the comparison itself takes hours. Is there any way to speed up the transfer? The command line I'm using is: /usr/local/bin/rsync --stats --rsh=/usr/bin/rsh --recursive --delete --one-file-system --xattrs --links --hard-links --times --perms --owner --group --exclude=.zfs /foo/bar root at remote:/foo/bar Thanks! -- Tim Gustafson tjg at soe.ucsc.edu 831-459-5354 Baskin Engineering, Room 313A
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If you are using zfs then why not use zfs send and zfs receive? Rsync has to stat every file on both ends which can take a long time with millions of files. The zfs tools don't have to do any of that. On 11/30/12 12:33, Tim Gustafson wrote:> We have a particular file system that we're trying to keep in sync > between two FreeBSD/ZFS servers using Rsync. > > The file system has many millions of files, and about 4TB of data > total. Rsync takes HOURS to run, even when there are no files to > transfer. Just the comparison itself takes hours. > > Is there any way to speed up the transfer? The command line I'm > using is: > > /usr/local/bin/rsync --stats --rsh=/usr/bin/rsh --recursive > --delete --one-file-system --xattrs --links --hard-links --times > --perms --owner --group --exclude=.zfs /foo/bar > root at remote:/foo/bar > > Thanks! >- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlC474AACgkQVKC1jlbQAQemwgCbBxfuOks4q6jX/1ic8IvfLavs VVMAoMByIhWP6PVAh3SCo5YVDKCu6xx5 =pYfe -----END PGP SIGNATURE-----
zfs send receive should work perfectly for this. If the destination has changes you don't mind loosing e.g. just permission changes you can have the receive roll back to the last known source snapshot before replaying changes. The result being two identical copies of everything. If there are other changes you also want you could maybe write a script combining zfs diff with rsync to only send the known changed files. Regards Steve ----- Original Message ----- From: "Tim Gustafson" <tjg at soe.ucsc.edu> To: <rsync at lists.samba.org> Sent: Friday, November 30, 2012 5:33 PM Subject: Speeding Up Rsync for Large File Sets> We have a particular file system that we're trying to keep in sync > between two FreeBSD/ZFS servers using Rsync. > > The file system has many millions of files, and about 4TB of data > total. Rsync takes HOURS to run, even when there are no files to > transfer. Just the comparison itself takes hours. > > Is there any way to speed up the transfer? The command line I'm using is: > > /usr/local/bin/rsync --stats --rsh=/usr/bin/rsh --recursive --delete > --one-file-system --xattrs --links --hard-links --times --perms > --owner --group --exclude=.zfs /foo/bar root at remote:/foo/bar===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster at multiplay.co.uk.
> Have you checked that you're not running out of memory?I have not seen any errors or warnings to that effect.> You probably want --delete-during instead of --delete.Will that speed up the file comparison stage, or is that just a good idea?> If you're checking for hard links, rsync has to track links for the entire > session. That can bog things down for zillions of files. If you know you > won't have links outside particular boundaries, you might benefit from > chunking up your rsync invocations to match those boundaries. > > You might benefit from doing that regardless, since multiple rsyncs in > parallel will do a better job of saturating your I/O bandwidth in this > particular case.We're already doing one rsync for each file system. We have about 2,000 file systems that we're rsyncing between these servers. There are just two file systems in particular that are troublesome. In fact, the two file systems in question were formerly one file system, and we split it in two to accomplish exactly what you describe. Maybe more splitting is in order, but because of the nature of the data, we're always going to have at least one tree of folders with a zillion files in it. -- Tim Gustafson tjg at soe.ucsc.edu 831-459-5354 Baskin Engineering, Room 313A