Hello, I'm quite a n00b on rsync stuff but I went to the website, read FAQ/how-to, Google and more, I setup my own rsync server and clients: everything works fine :-D I'm preparing a plan for a production mode in my company: we need to mirror around 100GB of data trough a special VPN internet line 2MB symmetric. The first time, the data will be transferred by a media such as a HD. Next, each night, we will try to update clients from the master server. It should be around 500MB to 3GB, no so much in comparison of the original size of data. I discovered "rsync" use a lot of CPU and RAM to run "checksums" on file that have to be synchronised. I need an opinion about my situation: So: each night, from 0:00am to maximum 7:00am, the server will have to check the 100Go of files and see what files have been modified, then, upload them to the clients. Each file is around 4MB to 40MB in average. I would like to know your opinion about this situation: - Should I setup a strong dual CPU computer dedicated to calculate this whole stuff? - What about the memory I should install? - Is there any bandwidth used during the checksums computation? Mine is quite limited. - I know the client computer will have to check files too; Disk I/O will be the most used. I think this computer will have NFS mount from a "datacenter" computer with a GB LAN card, I wonder it will be enough... I'm quite scared of the amount of data to check before synchronise clients, and how long it will take. To finish shortly, what do YOU think? Any advices? Thanks, Johan
// I wonder if this message has been posted, so I sent it again // Hello, I'm quite a n00b on rsync stuff but I went to the website, read FAQ/how-to, Google and more, I setup my own rsync server and clients: everything works fine :-D I'm preparing a plan for a production mode in my company: we need to mirror around 100GB of data trough a special VPN internet line 2MB symmetric. The first time, the data will be transferred by a media such as a HD. Next, each night, we will try to update clients from the master server. It should be around 500MB to 3GB, no so much in comparison of the original size of data. I discovered "rsync" use a lot of CPU and RAM to run "checksums" on file that have to be synchronised. I need an opinion about my situation: So: each night, from 0:00am to maximum 7:00am, the server will have to check the 100Go of files and see what files have been modified, then, upload them to the clients. Each file is around 4MB to 40MB in average. I would like to know your opinion about this situation: - Should I setup a strong dual CPU computer dedicated to calculate this whole stuff? - What about the memory I should install? - Is there any bandwidth used during the checksums computation? Mine is quite limited. - I know the client computer will have to check files too; Disk I/O will be the most used. I think this computer will have NFS mount from a "datacenter" computer with a GB LAN card, I wonder it will be enough... I'm quite scared of the amount of data to check before synchronise clients, and how long it will take. To finish shortly, what do YOU think? Any advices? Thanks, Johan
johan.boye@latecoere.fr wrote:>Hello, > > So: each night, from 0:00am to maximum 7:00am, the server will have to >check the 100Go of files and see what files have been modified, then, >upload them to the clients. Each file is around 4MB to 40MB in average. > >Are the clients what you call the "mirror"? Are there several of them?>I would like to know your opinion about this situation: > - Should I setup a strong dual CPU computer dedicated to calculate this >whole stuff? > >That depends.> - What about the memory I should install? > - Is there any bandwidth used during the checksums computation? Mine is >quite limited. > >Is that "2 mega BYTE per second" or "2 mega BIT per second"?> - I know the client computer will have to check files too; Disk I/O >will be the most used. I think this computer will have NFS mount from a >"datacenter" computer with a GB LAN card, I wonder it will be enough... > >Scanning 100GB of data in 7 hours doesn't require that much a disk bandwidth.> I'm quite scared of the amount of data to check before synchronise >clients, and how long it will take. To finish shortly, what do YOU >think? Any advices? > >Here are a few performance characteristics of rsync I think you should be aware of: - By default, rsync only checks files that are different between receiver and sender in timestamp or size. If most files in your archive did not change at all, you can discard them altogether from your bandwidth calculations. - The receiver only does a linear scan of the file, followed by generating a second file (which MAY require random access of the first file, if blocks in the file changed order). It's CPU performance requirements are negligible. This is bad for the case where you have one mirror source sending out info to many mirrors, as all the CPU load falls on the single server. - If your bandwidth is 2 mega BIT per second, you are a bit marginal as far as transferring 5GB of data in 7 hours. This has nothing to do with rsync, though. A simple calculation can show you the same result. Getting full bandwidth for the entire 7 hours will allow you to transfer 6 GB of data.>Thanks, > >Johan > >
> Object: Re: Question about rsync and BIG mirrorThanks for all your answers and advices. My problem seems on the side of the 2MB line one time the whole 190GB data are synchronised. I will keep in touch and give some feedbacks. Thanks for all