Benjamin Pflugmann
2011-Dec-27 12:25 UTC
Unexpected behavior with --hard-links and --ignore-existing
Hi, this is a re-send, because I apparently needed to subscribe to the list first. The confirmation mail said: "If you are joining the list with a held message, no NOT resend the message without first canceling the held message!" I am sorry, but I am not sure if my previous mail counts as "held" message and if so, what I need to do in order to cancel it (aside from that "no NOT" above is a typo, isn't it?). Back to my original request: I hope I am right here with my concern, if not kindly direct me to the right place. Thank you. I searched via google and the bug tracker but didn't find anyone with a similar problem[1]. Summary: When (repeatedly) running rsync with --hard-links and --ignore-existing, new hard links are copied instead of linked. Long story: I try to distribute a heavily hard-linked source directory to several machines. Due to the kind of files and services I usually only want to distribute new files, and prevent to modify or delete existing ones. Therefore I use --ignore-existing, which does the job just fine for normal files. To safe space and time, I want to add --hard-links, which also works as expected on its own. But combined, it seems that the existing files are not considered as candidates for linking. Reproduction recipe: ---------------------------------------------------------------------- $ rsync --version rsync version 3.0.8 protocol version 30 Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. # setting up an example source/target $ mkdir source $ echo "foo" > source/one $ rsync -av --hard-links --ignore-existing source/ target sending incremental file list created directory target ./ one sent 109 bytes received 34 bytes 286.00 bytes/sec total size is 4 speedup is 0.03 # now create a new hard link $ ln source/one source/two $ rsync -av --hard-links --ignore-existing source/ target sending incremental file list ./ two sent 121 bytes received 34 bytes 310.00 bytes/sec total size is 8 speedup is 0.05 # It got copied instead of linked # Now, if we skip --ignore-existing, a hard-link is created, as I # would have expected for the former command already. $ rsync -av --hard-links source/ target sending incremental file list two => one sent 78 bytes received 19 bytes 194.00 bytes/sec total size is 8 speedup is 0.08 ---------------------------------------------------------------------- Well, is this intended behaviour? The man page says: --ignore-existing This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing. This option is a transfer rule, not an exclude, so it doesn?t affect the data that goes into the file-lists, and thus it doesn?t affect deletions. It just limits the files that the receiver requests to be transferred. This option can be useful for those doing backups using the --link-dest option when they need to continue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hierarchy (when it is used properly), using --ignore existing will ensure that the already-handled files don?t get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that this option is only looking at the existing files in the destination hierarchy itself. To me, this doesn't suggest that the new file cannot be hard-linked to an existing file, but I must admit I do not understand all implications of transfer rules and excludes. Else, if this is working as intended, any suggestion how to have the advantage of --hard-links (size and speed wise), while not modifying existing files on the target? For now I do the rsync without --ignore-existing after I checked that rsync with --dry-run --existing --delete would do nothing. But that solution has it's own problems. Thank you in advance, Benjamin Pflugmann. [1] During my search I noticed some broken links, e.g. on ftp://ftp.samba.org/pub/unpacked/rsyncweb/bugzilla.html "NEWS file from the git repository" points to ftp://ftp.samba.org/ftp/unpacked/rsync/NEWS "patches dir" to ftp://ftp.samba.org/ftp/rsync/dev/patches/ which both give me a "550 Failed to change directory" error when I click the links in my browser. Same on ftp://ftp.samba.org/pub/unpacked/rsyncweb/issues.html for "TODO file" pointing to ftp://ftp.samba.org/ftp/rsync/TODO
Carlos Carvalho
2012-Jan-03 14:25 UTC
Unexpected behavior with --hard-links and --ignore-existing
Benjamin Pflugmann (benjamin-rsync at pflugmann.de) wrote on 27 December 2011 13:25: >Summary: When (repeatedly) running rsync with --hard-links and >--ignore-existing, new hard links are copied instead of linked. Seems natural to me. --ignore-existing does just that: ignores that files already on the destination exist, so it has to copy the new ones. To keep the existing versions you might want to do incremental backups with --link-dest.
Wayne Davison
2012-Jan-04 04:07 UTC
Unexpected behavior with --hard-links and --ignore-existing
On Tue, Dec 27, 2011 at 4:25 AM, Benjamin Pflugmann < benjamin-rsync at pflugmann.de> wrote:> Summary: When (repeatedly) running rsync with --hard-links and > --ignore-existing, new hard links are copied instead of linked.Yeah, it's a side-effect of how the skipping is encoded. All existing files are skipped before checking if they are up-to-date or not, so they are not considered as potential link choices. To change this, the code would need to be changed to remove the current skip-existing check and add in duplicate versions of it into every code path right after we figure out that this thing isn't up-to-date (e.g. in the symlink code, in the file code, in the device code, in the special-file code). That would let a non-up-to-date file get marked as skipped, but an up-to-date file get left as valid, but just not treated as a normal transfer file (since it's supposed to be getting skipped). This is not something I plan to tweak in the code at this time. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20120103/25b35d70/attachment.html>
Maybe Matching Threads
- different behavior for --backup-dir relative/path vs --backup-dir /full/path
- Unexpected behavior when browsing list objects by name, when the object name is a substring (prefix) of an existing object that has a valid value
- DO NOT REPLY [Bug 6927] New: Add a --fat option to ignore 1-second time diffs, ownership, hard links, symlinks, etc.
- link(2) EMLINK error behavior with --link-dest and --hard-links
- --link-dest / --hard-links problem