samba-bugs@samba.org
2008-Jul-04  02:08 UTC
DO NOT REPLY [Bug 5583] New: Files always updated even if time is the only difference
https://bugzilla.samba.org/show_bug.cgi?id=5583
           Summary: Files always updated even if time is the only difference
           Product: rsync
           Version: 2.6.9
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: core
        AssignedTo: wayned@samba.org
        ReportedBy: l.gumbley@auckland.ac.nz
         QAContact: rsync-qa@samba.org
I am using rsync to update compact flash cards and would like to minimise the
cycles on them.  The cards contain root FSs for a number of identical robots
that differ only in UUIDs, mac addresses, hostnames etc.  A large number of
files are generated specially for the update (thus always have different
timestamps to the existing files on the card) but almost always correspond
exactly to the files existing on the CF card.  My rsync -i output is full of:
>f..T...... etc/hostname
And other similar files, where the only thing being changed is the transfer
time, which I don't care about.
I accept that the files have to be transferred as the stamp is different, but I
don't see the point of writing the file if none of the data has changed.
I spent a significant amount of time trying to find an option that would
prevent this with no luck, apologies if I have overlooked something.
Finally, this is somewhat similar to bug 3229 but a little different in that
it's not to do with the backup function.
-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-Jul-04  02:31 UTC
DO NOT REPLY [Bug 5583] Files always updated even if time is the only difference
https://bugzilla.samba.org/show_bug.cgi?id=5583 ------- Comment #1 from matt@mattmccutchen.net 2008-07-03 21:31 CST ------- Try --checksum. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-Jul-04  05:18 UTC
DO NOT REPLY [Bug 5583] Files always updated even if time is the only difference
https://bugzilla.samba.org/show_bug.cgi?id=5583 ------- Comment #2 from l.gumbley@auckland.ac.nz 2008-07-04 00:18 CST ------- Thanks for your comment Matt, but --checksum takes in excess of a hundred times longer. I cancelled it as I couldn't be bothered waiting. It calculates the checksum of every file on the source system regardless of the size or timestamp before continuing. It might be a solution if the possibility existed to only calculate checksums in the case of a timestamp difference, however as I say it seems this option does not exist. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-Jul-04  08:14 UTC
DO NOT REPLY [Bug 5583] Don't write out an unchanged file if all the checksums matched
https://bugzilla.samba.org/show_bug.cgi?id=5583
wayned@samba.org changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
            Summary|Files always updated even if|Don't write out an
unchanged
                   |time is the only difference |file if all the checksums
                   |                            |matched
------- Comment #3 from wayned@samba.org  2008-07-04 03:13 CST -------
I have thought about trying to optimize out such a rewrite, and it is possible,
but only by delaying the start of the receiver beginning its update.  This
could slow things down if the file is actually different, but would speed
things up if the files were really the same.  I can see two different places to
put this logic:
One would be to have the receiver delay starting a temp file until it notices
that the sender has told it about a changed part of the basis file.  At that
point, it would need to create a temporary file, open the basis file, and do a
basis copy from 0 to the current position, and then proceed normally with the
reset of the copy.  However, if no difference was found, the update would not
be needed, and would be discarded.  (One potential issue: the receiver would
need to have a way to get the full-file checksum from the generator so that it
could do a double-check against the sender's full-file checksum, since it
will
not have computed one.)
Another option would be to put the short-circuit into the sender's logic so
that it doesn't tell the receiver to do anything until it first finds a file
difference.  The protocol would be extended to have a way to convey to the
receiver that the file doesn't need any updates (since the receiver probably
needs to do its post-transfer attribute updating, and may need to notify the
generator that the file is done).  We'd still need a solution to the
full-file
checksum verification.
One other option that is available now is to use one of the checksum caching
patches from the patches directory (such as the one that caches file-info in a
DB and associates the last-known attributes with a checksum, allowing rsync to
more quickly notice when files are the same).
-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2009-Nov-13  10:49 UTC
DO NOT REPLY [Bug 5583] Don't write out an unchanged file if all the checksums matched
https://bugzilla.samba.org/show_bug.cgi?id=5583
henrik-rsync at prak.org changed:
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |henrik-rsync at prak.org
------- Comment #4 from henrik-rsync at prak.org  2009-11-13 04:49 CST -------
Here's my "me too" comment on the issue (feel free to move it to a
separate bug
depending on this one):
I have stumbled upon the same issue in connection with rsnapshot and rsync with
the "--detect-renamed" patch. 
Basically rsnapshot works like this: On the first run creates a full copy of a
directory tree /src to /dst/0. Then the next time it rotates /dst/(x) to
/dst/(x+1) and creates a copy with just hard links from /dst/1 to /dst/0 and
then calls rsync to transfer the changes between /src and /dst/0, effectively
creating a differential backup at the granularity of files.
I applied the detect-renamed patch to avoid multiple copies of big files when
they are moved around in the directory tree. 
The patch works in so far as it finds the correct base files in /dst.
Then it uses the delta algorithm to make sure that no coincidental match of
filename,size and mtime results in a false positive.
Unfortunately usage of the delta algorithm creates a new copy of the file at
/dst even if the content is the same as the base file (instead of using a
hardlink to the base file).
-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
Possibly Parallel Threads
- DO NOT REPLY [Bug 4128] New: ignore-times with link-dest behaves unexpected / sematics not clear
- DO NOT REPLY [Bug 5459] New: Large amount of files makes checksum count negative
- DO NOT REPLY [Bug 5201] New: Rsync lets user corrupt dest by applying non-inplace batch in inplace mode
- DO NOT REPLY [Bug 6590] New: [sender] could not find xattr #1 for home/jdoe/TheFresh
- DO NOT REPLY [Bug 5482] New: apply the rsync comparison algorithm specially to .mov and .mp4 files