I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this? For example, I plan on doing rsync -azv /largefs /targetfs /targetfs is a NFS mounted filesystem. Any thoughts? TIA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20080621/386fc90a/attachment-0002.html>
John R Pierce
2008-Jun-21 15:45 UTC
[CentOS] recommendations for copying large filesystems
Mag Gam wrote:> I need to copy over 100TB of data from one server to another via > network. What is the best option to do this? I am planning to use > rsync but is there a better tool or better way of doing this? > > For example, I plan on doing > rsync -azv /largefs /targetfs > > /targetfs is a NFS mounted filesystem. > > Any thoughts?rsync would probably work better if you ran it in client server mode rather than over NFS, especially if you have to restart it.
Mag Gam wrote:> I need to copy over 100TB of data from one server to another via network. > What is the best option to do this? I am planning to use rsync but is there > a better tool or better way of doing this? > > For example, I plan on doing > rsync -azv /largefs /targetfs > > /targetfs is a NFS mounted filesystem.The only problem you are likely to have is that rsync reads the entire directory contents into RAM before starting, then walks the list fixing the differences. If you have a huge number of files and a small amount of RAM, it may slow down due to swapping. 'cp -a ' can be faster if the target doesn't already have any matching files. Also, the -v to display the names can take longer than the file transfer on small files. Running rsync over ssh instead of nfs has a tradeoff in that the remote does part of the work but you lose some speed to ssh encryption. If the filesystem is live, you might make an initial run copying the larger directories with rsync or cp, then do whatever you can to stop the files from changing and make another pass with 'rsync --av --delete' which should go fairly quickly and and fix any remaining differences. -- Les Mikesell lesmikesell at gmail.com
Rainer Duffner
2008-Jun-21 17:19 UTC
[CentOS] recommendations for copying large filesystems
Am 21.06.2008 um 15:33 schrieb Mag Gam:> I need to copy over 100TB of data from one server to another via > network. What is the best option to do this? I am planning to use > rsync but is there a better tool or better way of doing this? > > For example, I plan on doing > rsync -azv /largefs /targetfs > > /targetfs is a NFS mounted filesystem. > >What network link is there between these hosts? Are these 1 or 2 million small files or bigger ones? Does the data change a lot? Is it a SAN or JBOD? cheers, Rainer -- Rainer Duffner CISSP, LPI, MCSE rainer at ultra-secure.de
On Sat, 2008-06-21 at 09:33 -0400, Mag Gam wrote:> I need to copy over 100TB of data from one server to another via > network. What is the best option to do this? I am planning to use > rsync but is there a better tool or better way of doing this?At gigabit speeds, you're looking at over a week of transfer time: 1 gigabit = 125MB/sec = 800,000 seconds = 9.25 days, not counting protocol overhead. You could speed this up with link bonding, which from previous threads sounds like something you're working on already. If it's a oneoff transfer and you can afford downtime while you're fiddling with hardware, you may consider directly attaching both sets of storage to the same machine and doing a local copy.
Mag Gam wrote:> I need to copy over 100TB of data from one server to another via > network. What is the best option to do this? I am planning to use > rsync but is there a better tool or better way of doing this? > > For example, I plan on doing > rsync -azv /largefs /targetfs > > /targetfs is a NFS mounted filesystem. > > Any thoughtsYou are going to pay a large performance penalty for the simplicity of using a local form rsync. Between the substantial overheads of rsync itself and NFS you are not going to come anywhere near your maximum possible speed and you will probably need a lot of memory if you have a lot of files (rsync uses a lot of memory to track all the files). When I'm serious about moving large amounts of data at the highest speed I use tar tunneled through ssh. The rough invokation to pull from a remote machine looks like this: ssh -2 -c arcfour -T -x sourcemachine.com 'tar --directory=/data -Scpf - .' | tar --directory=/local-data-dir -Spxf -" That should pull the contents of the sourcemachine's /data directory to an already existing local /local-data-dir. On reasonably fast machines (better than 3 Ghz CPUs) it tends to approach the limit of either your hard drives' speed or your network capacity. If you don't like the ssh tunnel, you can strip it down to just the two tars (one to throw and one to catch) and copy it over NFS. It will still be faster than what you are proposing. Or you can use cpio. Rsync is best at synchonizing two already nearly identical trees. Not so good as a bulk copier. -- Benjamin Franz -- Benjamin Franz> TIA > > > ------------------------------------------------------------------------ > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >