Kevin Korb
2022-Feb-03 22:38 UTC
Confused as to why rsync thinks time, owner and group of many files differ
Are you using the same source and target each time? I ask because the only discrepancy I see is the link count which shows that there are 11 more instances of that inode on the source than the target. Maybe instances in other snapshots are being updated/re-linked? The only other thing to mention is that when you abort rsync (with -P or --inplace) incomplete files are left. Rsync doesn't fix the owner+group until it is done with a directory and it doesn't fix the timestamp until it is done with a file. This would be why you shouldn't mix those options with --update since the truncated file will be newer than the source file. On 2/3/22 17:04, Andy Smith via rsync wrote:> Hi, > > I am at the moment using rsync to move quite a big set of backups > from one machine to another. The source filesystem is xfs; the > target filesystem is btrfs. > > For various reasons I have been stopping the rsync part way through > and re-starting. I have noticed that a large number of files are > transferred over and over and I can't work out why. > > Example: > > sudo rsync -iPva \ > --inplace \ > --numeric-ids \ > --delete \ > /data/backup/rsnapshot/daily.0/cacti/ \ > root at koff:/data/backup/rsnapshot/daily.0/cacti/ > > ... > <f..t.og... var/www/index.html > 5,258 100% 5.78kB/s 0:00:00 (xfr#1276, to-chk=1/43437) > > If I run the rsync command again, thousands of lines of output will > appear again, all showing itemized changes for 't' and sometimes > 'p', 'o' and 'g'. Notably, var/www/index.html will keep appearing in > the list. > > Let's have a look at that file. > > Source: > > $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > Size: 5258 Blocks: 16 IO Block: 4096 regular file > Device: fd05h/64773d Inode: 354337 Links: 37 > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2022-02-03 04:53:12.115719681 +0000 > Modify: 2006-07-14 16:42:37.000000000 +0000 > Change: 2022-01-01 17:31:28.553758359 +0000 > Birth: - > > Destination: > > $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > Size: 5258 Blocks: 16 IO Block: 4096 regular file > Device: 26h/38d Inode: 7534065 Links: 26 > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2022-01-25 20:40:09.930960486 +0000 > Modify: 2006-07-14 16:42:37.000000000 +0000 > Change: 2022-02-03 21:45:44.194559899 +0000 > Birth: 2022-01-25 20:40:09.930960486 +0000 > > When rsync considers times as being different, it means mtime, > right? Yet these files have identical mtimes. They also have > identical uid, gid and permissions. > > I would not expect this and other files like it to keep being > listed for change over and over again. I can tell by the summary > that the actual contents of the files weren't sent, so at least it > didn't try to send all the data again. But even if rsync did > consider these files to have different mtime/uid/gid, should it not > have written that and be happy at next run? > > rsync versions: > > Source: > > $ rsync --version > rsync version 3.1.2 protocol version 31 > Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others. > Web site: http://rsync.samba.org/ > Capabilities: > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, > append, ACLs, xattrs, iconv, symtimes, prealloc > > Destination: > > $ rsync --version > rsync version 3.2.3 protocol version 31 > Copyright (C) 1996-2020 by Andrew Tridgell, Wayne Davison, and others. > Web site: https://rsync.samba.org/ > Capabilities: > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, hardlink-specials, symlinks, IPv6, atimes, > batchfiles, inplace, append, ACLs, xattrs, optional protect-args, iconv, > symtimes, prealloc, stop-at, no crtimes > Optimizations: > SIMD, asm, openssl-crypto > Checksum list: > xxh128 xxh3 xxh64 (xxhash) md5 md4 none > Compress list: > zstd lz4 zlibx zlib none > > What am I missing? > > Thanks, > Andy >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Andy Smith
2022-Feb-03 23:48 UTC
Confused as to why rsync thinks time, owner and group of many files differ
Hi Kevin, On Thu, Feb 03, 2022 at 05:38:41PM -0500, Kevin Korb via rsync wrote:> Are you using the same source and target each time?Yes.> I ask because the only discrepancy I see is the link count which > shows that there are 11 more instances of that inode on the source > than the target. Maybe instances in other snapshots are being > updated/re-linked?I haven't yet let rsync run all the way through the whole source filesystem so it probably hasn't yet sent over some of the hardlinks that it knows about for this file. There's only ever one rsync going at once, because this is a one-off thing I am doing by hand.> The only other thing to mention is that when you abort rsync (with -P or > --inplace) incomplete files are left. Rsync doesn't fix the owner+group > until it is done with a directory and it doesn't fix the timestamp until it > is done with a file. This would be why you shouldn't mix those options with > --update since the truncated file will be newer than the source file.Okay, but: - it's thousands of files that are reported as having differing t/o/g, not just whichever one was being worked on when I hit ctrl-c. I'm only hitting ctrl-c because rsync sees thousands of changes that I can't explain. - they don't have differing t/o/g when you look at them. - their contents are identical anyway as confirmed by sha256sum and also as confirmed by the fact that rsync isn't sending the file contents over. - if I use "-I --checksum" to skip mtime checking and force checksum, rsync doesn't try to sync these files (it does still for the ones it thinks o/g are different). This partial workaround isn't very useful anyway as --checksum takes forever. Point is, it definitely thinks there are changes of mtime, uid and/or gid. So I am still really confused. If I remove the --inplace I think the spurious t/o/g detection will still happen, and also that rsync will create a temp file to rename over each file, so blowing up the hardlinks that it has already sent across. This would be mere curiosity if it did this once and then was happy that it had set the mtime/uid/gid, but it doesn't, it does it every time, which is making things really slow. I am trying to build a newer rsync for use on the sender to see if that makes any difference but am also running into bizarre problems there, which is perhaps for another thread. Illegal instruction somewhere inside libcrypto. The same libcrypto that the packaged rsync is linked against. Goes away if I use --cc=none, but happens for md4 or md5. Really not my night! I am tempted to blow away the btrfs filesystem and just do xfs to xfs, to rule out weird issues there. It would be a shame though as I was hoping to use btrfs's compression here. Cheers, Andy