thr3ads.net - rsync - cut-off time for rsync ? [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Dirk van Deun

2015-Jul-01 08:06 UTC

cut-off time for rsync ?

> If your goal is to reduce storage, and scanning inodes doesnt matter,
> use --link-dest for targets. However, that'll keep a backup for every
> time that you run it, by link-desting yesterday's copy. 
The goal was not to reduce storage, it was to reduce work.  A full
rsync takes more than the whole night, and the destination server is
almost unusable for anything else when it is doing its rsyncs.  I
am sorry if this was unclear.  I just want to give rsync a hint that
comparing files and directories that are older than one week on
the source side is a waste of time and effort, as the rsync is done
every day, so they can safely be assumed to be in sync already.

Dirk van Deun
-- 
Ceterum censeo Redmond delendum

Simon Hobson

2015-Jul-01 13:05 UTC

head link

cut-off time for rsync ?

> The goal was not to reduce storage, it was to reduce work.  A full
> rsync takes more than the whole night, and the destination server is
> almost unusable for anything else when it is doing its rsyncs.  I
> am sorry if this was unclear.  I just want to give rsync a hint that
> comparing files and directories that are older than one week on
> the source side is a waste of time and effort, as the rsync is done
> every day, so they can safely be assumed to be in sync already.
I thought something rang a bell ...

From the man page :>        -I, --ignore-times
>               Normally rsync will skip any files that are already the
>               same size and have the same  modification  time-stamp.
>               This option turns off this "quick check" behavior,
>               causing all files to be updated.
As I read this, the default is to look at the file size/timestamp and if they
match then do nothing as they are assumed to be identical. So unless you have
specified this, then files which have already been copied should be ignored -
the check should be quite low in CPU, at least compared to the "cost"
of generating a file checksum etc.
AFAIK there is no option to completely ignore files by timestamp - at least not
within rsync itself.

Ken Chase

2015-Jul-01 14:34 UTC

head link

cut-off time for rsync ?

What is taking time, scanning inodes on the destination, or recopying the entire
backup because of either source read speed, target write speed or a slow
interconnect
between them?

Do you keep a full new backup every day, or are you just overwriting the target
directory?

/kc


On Wed, Jul 01, 2015 at 10:06:57AM +0200, Dirk van Deun said:
  >> If your goal is to reduce storage, and scanning inodes doesnt matter,
  >> use --link-dest for targets. However, that'll keep a backup for
every
  >> time that you run it, by link-desting yesterday's copy.
  > 
  >The goal was not to reduce storage, it was to reduce work.  A full
  >rsync takes more than the whole night, and the destination server is
  >almost unusable for anything else when it is doing its rsyncs.  I
  >am sorry if this was unclear.  I just want to give rsync a hint that
  >comparing files and directories that are older than one week on
  >the source side is a waste of time and effort, as the rsync is done
  >every day, so they can safely be assumed to be in sync already.
  >
  >Dirk van Deun
  >-- 
  >Ceterum censeo Redmond delendum

-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front
St. W.

Mark

2015-Jul-02 08:57 UTC

head link

cut-off time for rsync ?

You could use find to build a filter to use with rsync, then update the 
filter every few days if it takes too long to create.

I have used a script to build a filter on the source server to exclude 
anything over 5 days old, invoked when the sync starts, but it only 
parses around 2000 files per run.

Mark.


On 2/07/2015 2:34 a.m., Ken Chase wrote:> What is taking time, scanning inodes on the destination, or recopying the
entire
> backup because of either source read speed, target write speed or a slow
interconnect
> between them?
>
> Do you keep a full new backup every day, or are you just overwriting the
target
> directory?
>
> /kc
>
>
> On Wed, Jul 01, 2015 at 10:06:57AM +0200, Dirk van Deun said:
>    >> If your goal is to reduce storage, and scanning inodes doesnt
matter,
>    >> use --link-dest for targets. However, that'll keep a backup
for every
>    >> time that you run it, by link-desting yesterday's copy.
>    >
>    >The goal was not to reduce storage, it was to reduce work.  A full
>    >rsync takes more than the whole night, and the destination server is
>    >almost unusable for anything else when it is doing its rsyncs.  I
>    >am sorry if this was unclear.  I just want to give rsync a hint that
>    >comparing files and directories that are older than one week on
>    >the source side is a waste of time and effort, as the rsync is done
>    >every day, so they can safely be assumed to be in sync already.
>    >
>    >Dirk van Deun
>    >--
>    >Ceterum censeo Redmond delendum
>

Dirk van Deun

2015-Jul-02 09:43 UTC

head link

cut-off time for rsync ?

> What is taking time, scanning inodes on the destination, or recopying the
entire
> backup because of either source read speed, target write speed or a slow
interconnect
> between them?
It takes hours to traverse all these directories with loads of small
files on the backup server.  That is the limiting factor.  Not
even copying: just checking the timestamp and size of the old copies.

The source server is the actual live system, which has fast disks,
so I can afford to move the burden to the source side, using the find
utility to select homes that have been touched recently and using
rsync only on these.

But it would be nice if a clever invocation of rsync could remove the
extra burden entirely.

Dirk van Deun
-- 
Ceterum censeo Redmond delendum

Ken Chase

2015-Jul-02 13:47 UTC

head link

cut-off time for rsync ?

On Wed, Jul 01, 2015 at 02:05:50PM +0100, Simon Hobson said:

  >As I read this, the default is to look at the file size/timestamp and if
  they match then do nothing as they are assumed to be identical. So unless
  you have specified this, then files which have already been copied should be
  ignored - the check should be quite low in CPU, at least compared to the
  "cost" of generating a file checksum etc.

This belies the issue of many rsync users not sufficiently abusing rsync to do
backups like us idiots do! :) You have NO IDEA how long it takes to scan 100M
files
on a 7200 rpm disk. It becomes the dominant issue - CPU isnt the issue at all.
(Additionally, I would think that metadata scanning could max out only 2 cores
anyway - 1 for rsync's userland gobbling of another core of kernel running
the
fs scanning inodes).

This is why throwing away all that metadata seems silly. Keeping detailed logs
and parsing them before copy would be good, but requires an external selection
script before rsync starts, the script handing rsync a list of files to copy
directly. Unfortunate because rsync's scan method is quite advanced, but
doesnt
avoid this pitfall.

Additionally, I dont know if linux (or freebsd or any unix) can be told to cache
metadata more aggressively than data - not much point for the latter on a backup
server. The former would be great. I dont know how big metadata is in ram either
for typical OS's, per inode.

/kc
-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto
Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front
St. W.

Apparently Analagous Threads

Search for more maybe matching threads

rsync - Jul 2015 - cut-off time for rsync ?

cut-off time for rsync ?

cut-off time for rsync ?

cut-off time for rsync ?

cut-off time for rsync ?

cut-off time for rsync ?

cut-off time for rsync ?

Apparently Analagous Threads