On 1/5/19 1:57 PM, Danishka Navin wrote:> Hi Team,
>
> I have a requirement of backing up 3 files (each is aound 50M) to a central
> server.
> There are 750-800 client systems in diffrennt locations (not in same
> network) and suppose to backup over internet.
>
> I am planning to use rsync over ssh.
Do be careful with this -- if the files are very different, sftp or scp
will be more efficient. Odds are on a 50MB file, you will not see a big
difference, but rsync will sometimes really bite you when files change a
lot or are totally different. Ask me how I know. :D (off-list) Small
difference x 800 might start to add up...and depending on the OS, one
less package to worry about.
> I need to know how much what resources and what configuration required to
> make this all clients backup the content as soon as possible.
> I can provide at least 4 servers instead of single central server where we
> can split number of connections per server.
>
> Regards,
Are the clients reaching out to the backup server, or the backup server
reaching out to the clients? (pushing the backups vs. pulling them).
The first bottleneck you will run into if you try to start 800 SSH
sessions at the same moment on a modern computer is probably processor.
There's a big CPU load when the SSH connection first starts, and I can't
imagine any modern computer that would enjoy 800 (or 100) of those hits
at the /exact/ same time. This has to be avoided. Once the connection
is started, there's some computational power needed there, too, but
nothing compared to the initial connection.
If pulling the backups to the backup server, your task is really quite
simple -- set your script up to start and background the transfers, and
keep starting one a second until you hit whatever limit you set for the
number of rsyncs/scps (experiement), and don't start another until you
drop below that limit. Finally, pkill all rsync sessions after a period
of time so dead and failed connections don't start to pile up.
If pushing from the clients to the server, you need to stagger the
starts. Depending on the nature of your project, you could do a
sleep $(($RANDOM % 1000))
at the start of your backup script, or use the IP address to compute a
sleep time, or cut a number out of the system name or something along
those lines. You don't want to manage 800 different scripts, so cause a
delay either by computation or randomization.
Do that, and you probably won't have to worry about any other resource
and you can do it on one computer.
Nick.