Before anyone yells at me, yes, you can use rsync's --checksum to detect (and fix) files that are incorrect despite having correct timestamps and sizes. This would mean that a previous rsync had been corrupted not the current one. But it is important to note that this would only be reported to you if you also use --itemize-changes and what to look for (a file with a c but not an s or a t). It is also worth noting that single file compression tools (like gzip) automatically set the original mtime when compressing or decompressing. If you decompress then recompress such a file you can cause a case of a file with matching mtime+size but not matching checksum due to gzip's metadata even though the uncompressed result is identical. I would not consider this to be a case worth updating the remote copy but I am sure someone will disagree. On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote:> The -c option causes rsync to checksum EVERY file on both ends BEFORE > rsync does anything else. It checksums files that are on only 1 end. > It checksums files that are different sizes. It will not catch a > hardware problem preventing rsync from writing a file correctly. > > On 03/23/2017 03:12 PM, steven banville via rsync wrote: >> >> Hi >> >> >> I am using "rsync" to send files from a source machine to a remote >> machine as one typically does. I would like to clarify that the "-c" >> option will cause the checksum on the receiving end to be created by >> reading the already written file and NOT the data stream on the >> receiving end. This would help in catching disk I/O errors if the >> checksum is done on the file on disk. >> >> I understand if the size and (or date?) don't match, the checksum is not >> needed on the receiving end. >> >> I may be missing something but it wasn't entirely clear to me that the >> checksum is done based on the file on disk. >> >> Thanks, >> -Steve >> >> >> > > >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 224 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20170323/d192fa00/signature.sig>
Hi This is a very delayed response but thanks very much for your answer, it is appreciated. It seems that if you do an rsync a second / subsequent time with the "-c" (--checksum), say for data that has not changed, it would have to generate checksums from the files on disk at both ends, even if the size and timestamps are the same, is this not the case ? If it is, then it would seem we would be catching a disk write error. In the past I had experienced issues with hardware writes failing (network or disk), and although rare, for some critical data it is something of concern; that is what prompted this question. I don't need this high level of fidelity of most data, just a small subset. The use case is: * Create raw data * Move to backup location very reliably. * Delete original data set. Thanks again. Steven Banville Cirina 201 Gateway Boulevard, Floor 1 South San Francisco, CA 94080-7019 http://cirina.com/ This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ________________________________________ From: rsync [rsync-bounces at lists.samba.org] on behalf of Kevin Korb via rsync [rsync at lists.samba.org] Sent: Thursday, March 23, 2017 1:10 PM To: rsync at lists.samba.org Subject: Re: rsync: "-c" option clarification Before anyone yells at me, yes, you can use rsync's --checksum to detect (and fix) files that are incorrect despite having correct timestamps and sizes. This would mean that a previous rsync had been corrupted not the current one. But it is important to note that this would only be reported to you if you also use --itemize-changes and what to look for (a file with a c but not an s or a t). It is also worth noting that single file compression tools (like gzip) automatically set the original mtime when compressing or decompressing. If you decompress then recompress such a file you can cause a case of a file with matching mtime+size but not matching checksum due to gzip's metadata even though the uncompressed result is identical. I would not consider this to be a case worth updating the remote copy but I am sure someone will disagree. On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote:> The -c option causes rsync to checksum EVERY file on both ends BEFORE > rsync does anything else. It checksums files that are on only 1 end. > It checksums files that are different sizes. It will not catch a > hardware problem preventing rsync from writing a file correctly. > > On 03/23/2017 03:12 PM, steven banville via rsync wrote: >> >> Hi >> >> >> I am using "rsync" to send files from a source machine to a remote >> machine as one typically does. I would like to clarify that the "-c" >> option will cause the checksum on the receiving end to be created by >> reading the already written file and NOT the data stream on the >> receiving end. This would help in catching disk I/O errors if the >> checksum is done on the file on disk. >> >> I understand if the size and (or date?) don't match, the checksum is not >> needed on the receiving end. >> >> I may be missing something but it wasn't entirely clear to me that the >> checksum is done based on the file on disk. >> >> Thanks, >> -Steve >> >> >> > > >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
inline... On 05/19/2017 06:07 PM, steven banville wrote:> > Hi > > This is a very delayed response but thanks very much for your answer, it is appreciated. > > It seems that if you do an rsync a second / subsequent time with the "-c" (--checksum), say for data that has not changed, it would have to generate checksums from the files on disk at both ends, even if the size and timestamps are the same, is this not the case ? If it is, then it would seem we would be catching a disk write error.Yes, it checks every file even if the timestamps match. It even checksums the files that only exist on one end! This does not necessarily detect disk errors unless you flush your cache between runs. It also wouldn't report catching corruption without --itemize-changes and your interpretation of that output. Even then there can be false positives (gzip and similar will backdate a file when you compress/decompress even though the compressed version can be different).> In the past I had experienced issues with hardware writes failing (network or disk), and although rare, for some critical data it is something of concern; that is what prompted this question. I don't need this high level of fidelity of most data, just a small subset. > > The use case is: > * Create raw data > * Move to backup location very reliably. > * Delete original data set.The only time I have seen this kind of problem was when there was bad RAM being used as disk cache. The solution there is ECC RAM.> Thanks again. > > Steven Banville > Cirina > 201 Gateway Boulevard, Floor 1 > South San Francisco, CA 94080-7019 > http://cirina.com/ > > This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. > > ________________________________________ > From: rsync [rsync-bounces at lists.samba.org] on behalf of Kevin Korb via rsync [rsync at lists.samba.org] > Sent: Thursday, March 23, 2017 1:10 PM > To: rsync at lists.samba.org > Subject: Re: rsync: "-c" option clarification > > Before anyone yells at me, yes, you can use rsync's --checksum to detect > (and fix) files that are incorrect despite having correct timestamps and > sizes. This would mean that a previous rsync had been corrupted not the > current one. But it is important to note that this would only be > reported to you if you also use --itemize-changes and what to look for > (a file with a c but not an s or a t). > > It is also worth noting that single file compression tools (like gzip) > automatically set the original mtime when compressing or decompressing. > If you decompress then recompress such a file you can cause a case of a > file with matching mtime+size but not matching checksum due to gzip's > metadata even though the uncompressed result is identical. I would not > consider this to be a case worth updating the remote copy but I am sure > someone will disagree. > > On 03/23/2017 03:49 PM, Kevin Korb via rsync wrote: >> The -c option causes rsync to checksum EVERY file on both ends BEFORE >> rsync does anything else. It checksums files that are on only 1 end. >> It checksums files that are different sizes. It will not catch a >> hardware problem preventing rsync from writing a file correctly. >> >> On 03/23/2017 03:12 PM, steven banville via rsync wrote: >>> >>> Hi >>> >>> >>> I am using "rsync" to send files from a source machine to a remote >>> machine as one typically does. I would like to clarify that the "-c" >>> option will cause the checksum on the receiving end to be created by >>> reading the already written file and NOT the data stream on the >>> receiving end. This would help in catching disk I/O errors if the >>> checksum is done on the file on disk. >>> >>> I understand if the size and (or date?) don't match, the checksum is not >>> needed on the receiving end. >>> >>> I may be missing something but it wasn't entirely clear to me that the >>> checksum is done based on the file on disk. >>> >>> Thanks, >>> -Steve >>> >>> >>> >> >> >> > > -- > ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., > Kevin Korb Phone: (407) 252-6853 > Systems Administrator Internet: > FutureQuest, Inc. Kevin at FutureQuest.net (work) > Orlando, Florida kmk at sanitarium.net (personal) > Web page: http://www.sanitarium.net/ > PGP public key available on web site. > ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., >-- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 224 bytes Desc: OpenPGP digital signature URL: <http://lists.samba.org/pipermail/rsync/attachments/20170519/3cc1c5e0/signature.sig>