thr3ads.net - rsync - Nice little performance improvement [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Mike Connell

2009-Oct-16 02:07 UTC

Nice little performance improvement

Hi,

In my situation I'm using rsync to backup a server with (currently) about
570,000 files.
These are all little files and maybe .1% of them change or new ones are added in
any 15 minute period.

I've split the main tree up so rsync can run on sub sub directories of the
main tree.
It does each of these sub sub directories sequentially. I would have liked to
run
some of these in parallel, but that seems to increase i/o on the main server too
much.


Today I tried the following:

For all subsub directories
    a) Fork a "du -s subsubdirectory" on the destination
subsubdirectory
    b) Run rsync on the subsubdirectory
    c) repeat untill done

Seems to have improved the time it takes by about 25-30%. It looks like the du
can
run ahead of the rsync...so that while rsync is building its file list, the du
is warming up
the file cache on the destination. Then when rsync looks to see what it needs to
do
on the destination, it can do this more efficiently.

Looks like a keeper so far. Any other suggestions? (was thinking of a previous
suggestion of setting /proc/sys/vm/vfs_cache_pressure to a low value).

Thanks,

Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.samba.org/pipermail/rsync/attachments/20091015/58056c0d/attachment.html>

Darryl Dixon - Winterhouse Consulting

2009-Oct-16 03:13 UTC

head link

Nice little performance improvement

> Hi,
>
> In my situation I'm using rsync to backup a server with (currently)
about
> 570,000 files.
> These are all little files and maybe .1% of them change or new ones are
> added in
> any 15 minute period.
>
Hi Mike,

We have three filesystems that between them have approx 22 million files,
and around 10-20,000 new or changed files every business day.

In order to expeditiously move these new files offsite, we use a modified
version of pyinotify to log all added/altered files across the entire
filesystem(s) and then every five minutes feed the list to rsync with the
--files-from option. This works very effectively and quickly.

regards,
Darryl Dixon
Winterhouse Consulting Ltd
http://www.winterhouseconsulting.com

Matt McCutchen

2009-Oct-16 03:26 UTC

head link

Nice little performance improvement

On Thu, 2009-10-15 at 19:07 -0700, Mike Connell wrote:> Today I tried the following:
>  
> For all subsub directories
>     a) Fork a "du -s subsubdirectory" on the destination
> subsubdirectory
>     b) Run rsync on the subsubdirectory
>     c) repeat untill done
>  
> Seems to have improved the time it takes by about 25-30%. It looks
> like the du can
> run ahead of the rsync...so that while rsync is building its file
> list, the du is warming up
> the file cache on the destination. Then when rsync looks to see what
> it needs to do
> on the destination, it can do this more efficiently.
Interesting.  If you're not using incremental recursion (the default in
rsync >= 3.0.0), I can see that the "du" would help by forcing the
destination I/O to overlap the file-list building in time.  But with
incremental recursion, the "du" shouldn't be necessary because
rsync
actually overlaps the checking of destination files with the file-list
building on the source.

-- 
Matt

Mike Connell

2009-Oct-16 04:17 UTC

head link

Nice little performance improvement

Hi,
> In order to expeditiously move these new files offsite, we use a modified
> version of pyinotify to log all added/altered files across the entire
> filesystem(s) and then every five minutes feed the list to rsync with the
> --files-from option. This works very effectively and quickly.
Interesting...

How do you tell rsync to delete files that were deleted from the source, 
or is that not part of your use case?

Thanks,

Mike

Apparently Analagous Threads

Search for more apparently analagous threads

rsync - Oct 2009 - Nice little performance improvement

Nice little performance improvement

Nice little performance improvement

Nice little performance improvement

Nice little performance improvement

Apparently Analagous Threads