Hi there
We have a world-wide WAN network that we are running rsync over. It all
works - but the throughput of any one rsync connection is limited by the
latency on our inter-continental links. e.g a site may have a 4Mbs link,
but a single rsync job can only get 1.5Mbs of the bandwidth.
Running 3-5 rsync jobs in parallel gets around that limit and obviously
allows us to saturate a particular pipe (and yes - we want to :-), but
that requires hand-crafting schedules of bunches of rsync jobs in order
to achieve that. And we've got heaps - and want to open this up to our
users to define themselves - so such hand-crafting will be going the way
of the Dodo ;-)
What would be better is if one rsync job could be "chopped up" into a
bunch of "mini-"rsync jobs - and then a separate rsync run for each.
e.g. an rsync job mirroring 10Gb of data in 1,000,000 files would be
best split into 5 separate jobs - each mirroring 200,000,000 files. That
way we get to saturate the pipes and get the jobs done quicker.
So I'm about to see if I can figure out some way to do this in perl -
but was wondering if anyone else has already done this? Perhaps doing a
"rsync -nv" first and sorting the output, then splitting into
"X"
separate jobs? Even a rough guess at it could make a difference.
--
Cheers
Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1