Ray and Sandie Clark
2007-Aug-31 23:47 UTC
breakage? when using --ignore-times with --link-dest
Overview -------- I am trying to use --ignore-times with --link-dest and find that all files are duplicated inappropriately (IMHO). I think it is because --link-dest creates a hard link, which results in the link count, and so Change Time, changing. This requires rsync to create a new inode and duplicate the data just to preserve the ctime. If --ignore-times is not specified, only mtime and the file size is checked so rsync doesn't notice that other inode information changed (ctime and links). 1) I am hoping that someone can confirm my analysis of the situation 2) I am proposing that perhaps this behavior is not appropriate and should be changed. I would be interested in comments and suggestions of all kinds! Thank you. Detail ------ I am using rsync with the --link-dest option to perform backups to a local USB disk. It works great without the --ignore-times option, but now I want to add that option to make it more bulletproof. My rsync command is below (sometimes run with, sometimes without --ignore-times). I got unexpected behavior, and am trying to figure it out. WITHOUT --ignore-times, files which have not changed, are not duplicated. Instead a hard link was put into place, as should be with the --link-dest option. WITH --ignore-times, *every file* is duplicated in the target directory, even if it has not changed. I have verified that the file data has not changed by doing an MD5SUM of each. Another reason to duplicate the file would be to duplicate the metadata in the inode. I assume that this is what is causing the copy. The output of "stat" for the source file and the --link-dest file is included below. Other than device and inode number (Which necessarily are different and I ASSUME are not checked), the only changes are the link count, Access Time and Inode Change time. With --ignore-times, does rsync decide to duplicate the file to maintain either Access time, Change time, or link count, if nothing else changes, including all other inode information? This appears to be true from what I can see. Can someone confirm that? If so, --ignore-times defeats the purpose of --link-dest, since the inode will necessarily be changed to create the hard link, and creating the hard-link changes ctime. I would argue that the fact that Change Time is updated when a hard link is made confounds FILE metadata with File SYSTEM metadata (It might be better to have two Change Times, one for FILE metadata and one for FILE SYSTEM metadata, but we are not going to change that for sure!) For rsync purposes the link count should be ignored when deciding if a file changed or not. A hard link will get handled in due course if the --links option is selected. Instead rsync should base its decision that a new file must be created on pure FILE metadata (other stuff in the inode such as permissions, owner, group, etc). CTime by itself is not a useful indicator of whether FILE metadata has changed, dictating creation of a new file. Comments, rebuttals, etc.? Thank you. --Ray rsync \ --delete \ --devices \ --bwlimit "${BWLIMIT}" \ --group \ --hard-links \ --ignore-times \ --links \ --numeric-ids \ --owner \ --perms \ --recursive \ --sparse \ --specials \ --stats \ --times \ --verbose \ --verbose \ --verbose \ --verbose \ "--link-dest=${workingDir}/${previousRSyncPath}/${userName}" \ "${snapMountName}/" \ "${snapRSyncHome}/${userName}" Source to be rsynced: File: `newUserExample.tgz' Size: 14631 Blocks: 32 IO Block: 4096 regular file Device: fd01h/64769d Inode: 11 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 2010/sysadmin) Gid: ( 2010/sysadmin) Access: 2007-07-22 00:56:24.000000000 -0400 Modify: 2007-04-04 17:26:02.000000000 -0400 Change: 2007-04-10 19:25:08.000000000 -0400 File at --link-dest: File: `newUserExample.tgz' Size: 14631 Blocks: 32 IO Block: 4096 regular file Device: 805h/2053d Inode: 10797118 Links: 3 Access: (0664/-rw-rw-r--) Uid: ( 2010/sysadmin) Gid: ( 2010/sysadmin) Access: 2007-08-28 20:52:11.000000000 -0400 Modify: 2007-04-04 17:26:02.000000000 -0400 Change: 2007-08-28 21:10:22.000000000 -0400
On 8/30/07, Ray and Sandie Clark <rclark03@rochester.rr.com> wrote:> I am trying to use --ignore-times with --link-dest and find that all > files are duplicated inappropriately (IMHO).This is by design. --link-dest does not hard link in a basis file unless rsync trusts that it has the same data as the source file, but --ignore-times tells rsync never to trust that a source and destination file have the same data.> I am using rsync with the --link-dest option to perform backups to a > local USB disk. It works great without the --ignore-times option, but > now I want to add that option to make it more bulletproof.I would recommend --checksum as a way of bulletproofing the system without nullifying --link-dest. Matt
On Thu, Aug 30, 2007 at 03:01:13PM -0400, Ray and Sandie Clark wrote:> I am trying to use --ignore-times with --link-dest and find that all > files are duplicated inappropriately (IMHO).Rsync only hard-links a file with its --list-dest version if it is not transferred. Using --ignore-times causes all files to be transferred, so a --link-dest hierarchy only supplies file data for a more efficient transfer (assuming the transfer is remote) when combined with -I.> If --ignore-times is not specified, only mtime and the file size is > checked so rsync doesn't notice that other inode information changed > (ctime and links).You want the --checksum option. If you want rsync to hard-link together files with differing attributes (such as differing mtime), that attribute must not be specified as preserved (e.g. --no-times) OR you could have rsync copy into a hierarchy of files instead of an empty hierarchy as an alternate means of treating the history of file attributes as unimportant. One way to speed up the --checksum option is a preliminary patch named check checksum-updating.diff in the patches dir. It's purpose is to maintain .rsyncsum files in the source based on extra file data, such as ctime and (coming soon) inode. When this is done right, it will allow a new kind of transfer check that is not as expensive full-file checksumming and not as fragile as only monitoring size and mtime. ..wayne..
Apparently Analagous Threads
- DO NOT REPLY [Bug 5644] New: Option to recheck basis dirs for existing dest files
- ActiveRecord Observers and Model breakage.
- Weird breakage with Roaming Profiles and Quotas
- R_SVN_REVISION breakage on current R trunk.
- [PATCH] dash: Fix "pwd -P" breakage due to getcwd(0, 0) usage