thr3ads.net - rsync - (no subject) [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Frank Thomas

2007-Oct-05 01:22 UTC

(no subject)

Good day, 

 

I've got a question regarding the usage of rsync that I just cannot
figure out. I've done a fare hunt for the answer, but I'm stumped. 

 

Here is the situation.

 

I have two pc's running linux and using rsync to perform a backup from
server1 to server2. For example: rsync -avzr -e 'ssh
-i/root/.ssh/id_rsa' --delete /home/samba/admin/software
www.some-server.com:/home/RemoteSystems/company/home/samba/admin

Let's say I have a directory within rsync's scope to sync called
directory1.

Rsync is run and directory1 is sync'ed from server1 to server2. Also, a
file named File1 is sync'ed because it is in the directory being
sync'ed.

 

Server1                                                 server2

  Directory1                                               Directory1

     File1                                                        File1

 

Now, let's say a user comes and changes the name of the Directory1 on
server1 to DirectoryNew, rsync performs the following actions:

1.                   rsync recognizes that Directory 1 is not on
server1, but it is on server2, so it flags it and it's contents for
deletion on server2.

2.                   rsync recognizes that DirectoryNew is on server1,
but not on server2, so it flags it and it's contents for copying to
server2.

3.                   rsync performs these actions to make the two
directories the same. 

 

This action is the simplest method of performing an rsync, but it would
be nice to have rsync to be intelligent enough to recognize a name
change but not an inode change on the source. So the action performed
would be, 

1.                   rsync recognizes that Directory1 is not on server1,
but it's inode still is. Rsync reads the new directory name and flags
the name change from Directory1 to DirectoryNew on server1.

2.                   Rsync reads server2 and sees that Directory1
exists, and flags a pending name change on server2 from Directory1 to
DirectoryNew.

3.                   Name is changed on server2. No files or directories
are deleted and re-transferred from source to destination as the
structure under the directory has not changed.

 

Why go through all this work? I've had personnel change a directory name
that has several gigabytes of data in it without notifying me and at
night, rsync tries to perform the directory and file dance and fails
simply because the volume is so great. It would be nice to either, one,
recognize a large discrepancy between the source and destination before
anything occurs, by giving a message of amount of potential bytes that
would be transferred, (this doesn't work with dry-run option), or do the
fancy dance by recognizing a name change over a deletion of a directory.

 

Thanks.

 

Frank Thomas

 

-------------- next part --------------
HTML attachment scrubbed and removed

Paul Slootman

2007-Oct-05 08:58 UTC

head link

(no subject)

On Thu 04 Oct 2007, Frank Thomas wrote:> 
> 1.                   rsync recognizes that Directory1 is not on server1,
> but it's inode still is. Rsync reads the new directory name and flags
> the name change from Directory1 to DirectoryNew on server1.
The problem here is that rsync is stateless; i.e. it can't recognize
that the inode is still there, because it has no idea the inode was ever
there.  To accommodate that, a major redesign would probably be needed.


Paul Slootman

Charles Marcus

2007-Oct-05 17:41 UTC

head link

(no subject)

Frank Thomas, on 10/4/2007 3:57 PM, said the following:> it would be nice to have rsync to be intelligent enough to recognize
> a name change but not an inode change on the source.
Seems to me the best way to accomplish this is to be sure that the 
parent directory is not a directory that someone can rename... ie, when 
I rsync our home directories, there is no danger of anyone ever renaming 
the 'home' directory...

So, just put the top level directories that *can* be renamed by the 
users into a parent directory that they can *not* rename, and use that 
for the root for your rsync...

Or maybe I'm missing something?

-- 

Best regards,

Charles

Wayne Davison

2007-Oct-05 19:04 UTC

head link

[Detecting renames]

On Thu, Oct 04, 2007 at 01:57:22PM -0600, Frank Thomas
wrote:> This action is the simplest method of performing an rsync, but it
> would be nice to have rsync to be intelligent enough to recognize a
> name change but not an inode change on the source.
For the next "feature release" of rsync after 3.0.0, I'm imagining
adding support for a database API that would allow extra information
about files to be maintained and used (completely optional, of course).
In the scenario that you described, this new rsync would be run using a
DB cache and the --checksum option, which would lookup files on the
sending side by their inode + size + mtime + ctime (making the checksum
lookup efficient).  The receiving side would lookup incoming files based
on its checksum + size + mtime + ctime in order to find a local file
that it could use for hard-linking, copying, and/or renaming.  In the
meantime, the only rsync solution is the detect-renamed patch that Matt
mentioned.

..wayne..

N.J. van der Horn (Nico)

2007-Oct-05 22:06 UTC

head link

Renaming a directory results in an expensive retransmission

We are using rsync for several years, but since a couple of months
we use it to backup remote servers, some with more than 200GB capacity.

Especially Windows users sometimes have the (bad) habit to change
the name of a directory with huge amounts of data below them.

We see the same nasty results as you are talking about:

* rsync "thinks" that the old directory name has disappeared, and
deletes
  the directory on the target machine, throwing away the expensive
transmission
* the new directory name initiates a fresh / full (re)transmission,
  sometimes taking days.... while the "real work" would be done in
minutes...
* the servers we backup have between 20GB and 200GB capacity.
* all rsync's are run in parallel, average sync time is 1.5 hour for 900GB.
* when a "user" behaves as described, it takes days to a week to
resync.

It is a tricky problem to deal with i think, it is tempting to keep a
checksum'd file/directory list on both sides with information like:

* a fingerprint/signature/checksum to identify each file or directory
* inode number
* timestamp
* filesize

In case a files appears to be deleted, because the name/path is changed,
it could possibly be identified by it's fingerprint and used to sync
cleverly ;-)
This in the thought of expanding --fuzzy, giving it more functionality
(hint).

For some time i am experimenting with a solution to this problem, by
some sort
of a "preprocessor", that tries to identify in the described way,
creating
hardlinks (ln) to let rsync think the files are already in the new location.
I am traversing on both sides (remote and local) the directory trees,
producing
a file with the information described above, but it is still work in
progress...

The cost of keeping a database in this scenario would be truly justified
for me.

That rsync deletes the files in the old location is then no problem for
me anymore.

But.... i am just a user with needs... looking for a solution to a
problem also,
hoping this can be solved by the clever developers ;-)

Maybe there is already a solution available, and we are chasing shadows ?


Thanks, Nico


Frank Thomas schreef:>
> Good day,
>
>  
>
> I?ve got a question regarding the usage of rsync that I just cannot
> figure out. I?ve done a fare hunt for the answer, but I?m stumped.
>
>  
>
> Here is the situation.
>
>  
>
> I have two pc?s running linux and using rsync to perform a backup from
> server1 to server2. For example: rsync -avzr -e 'ssh
> -i/root/.ssh/id_rsa' --delete /home/samba/admin/software
> www.some-server.com:/home/RemoteSystems/company/home/samba/admin
>
> Let?s say I have a directory within rsync?s scope to sync called
> directory1.
>
> Rsync is run and directory1 is sync?ed from server1 to server2. Also,
> a file named File1 is sync?ed because it is in the directory being
> sync?ed.
>
>  
>
> Server1                                                 server2
>
>   Directory1                                               Directory1
>
>      File1                                                        File1
>
>  
>
> Now, let?s say a user comes and changes the name of the Directory1 on
> server1 to DirectoryNew, rsync performs the following actions:
>
> 1.                   rsync recognizes that Directory 1 is not on
> server1, but it is on server2, so it flags it and it?s contents for
> deletion on server2.
>
> 2.                   rsync recognizes that DirectoryNew is on server1,
> but not on server2, so it flags it and it?s contents for copying to
> server2.
>
> 3.                   rsync performs these actions to make the two
> directories the same.
>
>  
>
> This action is the simplest method of performing an rsync, but it
> would be nice to have rsync to be intelligent enough to recognize a
> name change but not an inode change on the source. So the action
> performed would be,
>
> 1.                   rsync recognizes that Directory1 is not on
> server1, but it?s inode still is. Rsync reads the new directory name
> and flags the name change from Directory1 to DirectoryNew on server1.
>
> 2.                   Rsync reads server2 and sees that Directory1
> exists, and flags a pending name change on server2 from Directory1 to
> DirectoryNew.
>
> 3.                   Name is changed on server2. No files or
> directories are deleted and re-transferred from source to destination
> as the structure under the directory has not changed.
>
>  
>
> Why go through all this work? I?ve had personnel change a directory
> name that has several gigabytes of data in it without notifying me and
> at night, rsync tries to perform the directory and file dance and
> fails simply because the volume is so great. It would be nice to
> either, one, recognize a large discrepancy between the source and
> destination before anything occurs, by giving a message of amount of
> potential bytes that would be transferred, (this doesn?t work with
> dry-run option), or do the fancy dance by recognizing a name change
> over a deletion of a directory.
>
>  
>
> Thanks.
>
>  
>
> *Frank Thomas*
>
>  
>
-- 
Behandeld door / Handled by: N.J. van der Horn (Nico)
---
ICT Support Vanderhorn IT-works, www.vanderhorn.nl,
Voorstraat 55, 3135 HW Vlaardingen, The Netherlands,
Tel +31 10 2486060, Fax +31 10 2486061

Maybe Matching Threads

Search for more reasonably related threads

rsync - Oct 2007 - (no subject)

(no subject)

(no subject)

(no subject)

[Detecting renames]

Renaming a directory results in an expensive retransmission

Maybe Matching Threads