On Thu, Jul 19, 2012 at 01:51:43PM -0400, Cary Lewis
wrote:> I want to use rsync with a cloud based rsync provider to do off-site
> backing up of a large (1TB) dataset which consists of 32 million+ files
> spread out in 300 directories. So the amount of files in any one directory
> can be quite large (upwards of 2 million).
You realize that stat() is a costly operation,
especially if the inodes are cache cold, even more so if something else
stresses the IO and VM subsystems on the box.
On a moderately loaded box, recursively stating 3 million files
occasionally took 90 minutes and more. Doing the same once the inodes
are cache-hot takes the same box under the same overall stress 30 to 90
*seconds*.
Holding 3 Millon dentries and inodes cache-hot requires (on that box,
anyways) ~ 5 Gigabyte of slab memory (of 128 G available...).
So if you want to regularly recursively stat (and that's what rsync
needs to do) 32 millon files, you better add more ram, much more ram,
to your box.
Also, you mention Cygwin.
IIRC, by default, that will still treat file names as case*in*sensitive,
so you get really bad (maybe O N^2?) behaviour
when walking large directories.
There was some setting which I do not remember right now,
to tell rsync and/or cygwin to treat this as casesensitive,
which can seriously improve behaviour with large directories.
> Rsync doesn't seem to cope with this well - even doing local copies in
a
> directory with several thousands of files takes a long time to initiate any
> transferring.
I'm speculating here.
But I thought the file list generation is still per sub-directory, so
would need to scan the current subdir fully before starting to work on
the resulting partial file list.
> I though that with version 3, rsync was supposed to start transferring
> before fully testing all of the files in a directory?
>
> I am using version 3.0.9 under Cygwin.
>
> Is there a command line switch I am supposed to use to force rsync to start
> transferring more quickly?
>
> Any insight / suggestions would be most appreciated.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com