On Tue 08 May 2007, Atte Peltomaki wrote:>
> I'm seeing a weird problem with rsync 2.6.9 protocol version 29 on
> Debian Sarge. When copying a file from one location to another between
> two Debian boxes, if destination includes a file with same size and
> name, rsync fails to see that they are not exactly the same file.
>
> The situation originates from copying a file to a place which is
> periodically rsynced onwards, and the rsync coperation takes place before
> the original file transfer to the rsync source is complete.
>
> Example:
>
> Source:
>
> md5sum:
> 2ac4a4ad88da17f49d26c9e578ce5432 somefile.exe
> sha1sum:
> eaabb30b716e993be000b89208e2d9f63e78f052 somefile.exe
> ls -l:
> -rwxrw---- 1 user group 109819105 Apr 2 10:48 somefile.exe*
>
> Destination:
>
> md5sum:
> 72c116866a75f859a19a150216768e52 somefile.exe
> sha1sum:
> 33b7d91fc6bd2a5bff292258c7d6eeb7db0aec8a somefile.exe
> ls -l:
> -rwxrw---- 1 user group 109819105 2007-04-02 10:48 somefile.exe*
The files apparently both have the same size, timestamp, and other
attributes. Only the contents differ.
> rsync is executed from the source with following flags:
>
> -alvv --delete --exclude-from=file
>
> where 'file' includes three lines:
> /upload
> /upload/*
> upload/
>
> rsync itself says about somefile.exe:
>
> somefile.exe is uptodate
>
> As you can see, md5sums and sha1sums reveal that the file is not the
> same, even though timestamps and sizes match.
You did not ask rsync to checksum the files...
> What is the exact algorithm rsync uses to determien wether a file is up
> to date or not?
Hmmm.... I thought this would have been in the manpage, but it's not
spelled out apparently.
Rsync by default compares size and mtime to determine whether a file
needs to be transferred. (I think the other attributes such as owner,
permissions are only updated, but don't necessarily incur a contents
sync, although I confess I'm not absolutely sure.)
You need to supply the --checksum option if you want to make sure that
the contents are indeed identical. This is normally not done as that
would cause a massive IO load; cases that size and timestamp are
identical but not the contents don't usually happen...
Paul Slootman