Andy Smith
2022-Feb-03 22:04 UTC
Confused as to why rsync thinks time, owner and group of many files differ
Hi, I am at the moment using rsync to move quite a big set of backups from one machine to another. The source filesystem is xfs; the target filesystem is btrfs. For various reasons I have been stopping the rsync part way through and re-starting. I have noticed that a large number of files are transferred over and over and I can't work out why. Example: sudo rsync -iPva \ --inplace \ --numeric-ids \ --delete \ /data/backup/rsnapshot/daily.0/cacti/ \ root at koff:/data/backup/rsnapshot/daily.0/cacti/ ... <f..t.og... var/www/index.html 5,258 100% 5.78kB/s 0:00:00 (xfr#1276, to-chk=1/43437) If I run the rsync command again, thousands of lines of output will appear again, all showing itemized changes for 't' and sometimes 'p', 'o' and 'g'. Notably, var/www/index.html will keep appearing in the list. Let's have a look at that file. Source: $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html Size: 5258 Blocks: 16 IO Block: 4096 regular file Device: fd05h/64773d Inode: 354337 Links: 37 Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2022-02-03 04:53:12.115719681 +0000 Modify: 2006-07-14 16:42:37.000000000 +0000 Change: 2022-01-01 17:31:28.553758359 +0000 Birth: - Destination: $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html Size: 5258 Blocks: 16 IO Block: 4096 regular file Device: 26h/38d Inode: 7534065 Links: 26 Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2022-01-25 20:40:09.930960486 +0000 Modify: 2006-07-14 16:42:37.000000000 +0000 Change: 2022-02-03 21:45:44.194559899 +0000 Birth: 2022-01-25 20:40:09.930960486 +0000 When rsync considers times as being different, it means mtime, right? Yet these files have identical mtimes. They also have identical uid, gid and permissions. I would not expect this and other files like it to keep being listed for change over and over again. I can tell by the summary that the actual contents of the files weren't sent, so at least it didn't try to send all the data again. But even if rsync did consider these files to have different mtime/uid/gid, should it not have written that and be happy at next run? rsync versions: Source: $ rsync --version rsync version 3.1.2 protocol version 31 Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others. Web site: rsync.samba.org Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, prealloc Destination: $ rsync --version rsync version 3.2.3 protocol version 31 Copyright (C) 1996-2020 by Andrew Tridgell, Wayne Davison, and others. Web site: rsync.samba.org Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, hardlink-specials, symlinks, IPv6, atimes, batchfiles, inplace, append, ACLs, xattrs, optional protect-args, iconv, symtimes, prealloc, stop-at, no crtimes Optimizations: SIMD, asm, openssl-crypto Checksum list: xxh128 xxh3 xxh64 (xxhash) md5 md4 none Compress list: zstd lz4 zlibx zlib none What am I missing? Thanks, Andy
Kevin Korb
2022-Feb-03 22:38 UTC
Confused as to why rsync thinks time, owner and group of many files differ
Are you using the same source and target each time? I ask because the only discrepancy I see is the link count which shows that there are 11 more instances of that inode on the source than the target. Maybe instances in other snapshots are being updated/re-linked? The only other thing to mention is that when you abort rsync (with -P or --inplace) incomplete files are left. Rsync doesn't fix the owner+group until it is done with a directory and it doesn't fix the timestamp until it is done with a file. This would be why you shouldn't mix those options with --update since the truncated file will be newer than the source file. On 2/3/22 17:04, Andy Smith via rsync wrote:> Hi, > > I am at the moment using rsync to move quite a big set of backups > from one machine to another. The source filesystem is xfs; the > target filesystem is btrfs. > > For various reasons I have been stopping the rsync part way through > and re-starting. I have noticed that a large number of files are > transferred over and over and I can't work out why. > > Example: > > sudo rsync -iPva \ > --inplace \ > --numeric-ids \ > --delete \ > /data/backup/rsnapshot/daily.0/cacti/ \ > root at koff:/data/backup/rsnapshot/daily.0/cacti/ > > ... > <f..t.og... var/www/index.html > 5,258 100% 5.78kB/s 0:00:00 (xfr#1276, to-chk=1/43437) > > If I run the rsync command again, thousands of lines of output will > appear again, all showing itemized changes for 't' and sometimes > 'p', 'o' and 'g'. Notably, var/www/index.html will keep appearing in > the list. > > Let's have a look at that file. > > Source: > > $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > Size: 5258 Blocks: 16 IO Block: 4096 regular file > Device: fd05h/64773d Inode: 354337 Links: 37 > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2022-02-03 04:53:12.115719681 +0000 > Modify: 2006-07-14 16:42:37.000000000 +0000 > Change: 2022-01-01 17:31:28.553758359 +0000 > Birth: - > > Destination: > > $ stat /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > File: /data/backup/rsnapshot/daily.0/cacti/var/www/index.html > Size: 5258 Blocks: 16 IO Block: 4096 regular file > Device: 26h/38d Inode: 7534065 Links: 26 > Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2022-01-25 20:40:09.930960486 +0000 > Modify: 2006-07-14 16:42:37.000000000 +0000 > Change: 2022-02-03 21:45:44.194559899 +0000 > Birth: 2022-01-25 20:40:09.930960486 +0000 > > When rsync considers times as being different, it means mtime, > right? Yet these files have identical mtimes. They also have > identical uid, gid and permissions. > > I would not expect this and other files like it to keep being > listed for change over and over again. I can tell by the summary > that the actual contents of the files weren't sent, so at least it > didn't try to send all the data again. But even if rsync did > consider these files to have different mtime/uid/gid, should it not > have written that and be happy at next run? > > rsync versions: > > Source: > > $ rsync --version > rsync version 3.1.2 protocol version 31 > Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others. > Web site: rsync.samba.org > Capabilities: > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, > append, ACLs, xattrs, iconv, symtimes, prealloc > > Destination: > > $ rsync --version > rsync version 3.2.3 protocol version 31 > Copyright (C) 1996-2020 by Andrew Tridgell, Wayne Davison, and others. > Web site: rsync.samba.org > Capabilities: > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, hardlink-specials, symlinks, IPv6, atimes, > batchfiles, inplace, append, ACLs, xattrs, optional protect-args, iconv, > symtimes, prealloc, stop-at, no crtimes > Optimizations: > SIMD, asm, openssl-crypto > Checksum list: > xxh128 xxh3 xxh64 (xxhash) md5 md4 none > Compress list: > zstd lz4 zlibx zlib none > > What am I missing? > > Thanks, > Andy >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: sanitarium.net PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Paul Slootman
2022-Feb-04 09:29 UTC
Confused as to why rsync thinks time, owner and group of many files differ
On Thu 03 Feb 2022, Andy Smith via rsync wrote:> sudo rsync -iPva \ > --inplace \ > --numeric-ids \ > --delete \ > /data/backup/rsnapshot/daily.0/cacti/ \ > root at koff:/data/backup/rsnapshot/daily.0/cacti/ > > ... > <f..t.og... var/www/index.html > 5,258 100% 5.78kB/s 0:00:00 (xfr#1276, to-chk=1/43437)Could you try the transfer like this?: sudo rsync -ia \ --debug=OWN,TIME \ --inplace \ --numeric-ids \ --delete \ /data/backup/rsnapshot/daily.0/cacti/var/www/index.html \ root at koff:/data/backup/rsnapshot/daily.0/cacti/var/www/ That should give detailed information about ownership and modification times, limiting the transfer to just that index.html file to limit the amount of output. Paul
Wayne Davison
2022-Feb-07 16:58 UTC
Confused as to why rsync thinks time, owner and group of many files differ
On Thu, Feb 3, 2022 at 2:23 PM Andy Smith via rsync <rsync at lists.samba.org> wrote:> When rsync considers times as being different, it means mtime, right? Yet > these files have identical mtimes. They also have identical uid, gid and > permissions. >They do now, but it looks like you have lots of files hard-linked together and presumably 2 of them aren't the same anymore on the source. Thus, the file gets modified to the other version during the transfer and then modified back elsewhere in the transfer. If you get rid of the --inplace option, rsync will be able to separate them. ..wayne.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.samba.org/pipermail/rsync/attachments/20220207/3b0f0a85/attachment.htm>