What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks. I found a mention of hashdeep on the 'net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the 'net, is possibly no longer maintained.\ I also found md5deep which seems similar. Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?
On Fri, 27 Oct 2017 17:27:22 -0400 H wrote:> What is the best tool to compare file hashes in two different > drives/directories such as after copying a large number of files from one > drive to another? I used cp -au to copy directories, not rsync, since it is > between local disks.diff --brief -r dir1/ dir2/ might do what you need. If you also want to see differences for files that may not exist in either directory: diff --brief -Nr dir1/ dir2/ -- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com
On 10/27/2017 05:35 PM, Frank Cox wrote:> On Fri, 27 Oct 2017 17:27:22 -0400 > H wrote: > >> What is the best tool to compare file hashes in two different >> drives/directories such as after copying a large number of files from one >> drive to another? I used cp -au to copy directories, not rsync, since it is >> between local disks. > diff --brief -r dir1/ dir2/ > > might do what you need. > > If you also want to see differences for files that may not exist in either directory: > > diff --brief -Nr dir1/ dir2/ >But is diff not best suited for text files?
> Am 27.10.2017 um 23:27 schrieb H <agents at meddatainc.com>: > > What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks. > > I found a mention of hashdeep on the 'net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the 'net, is possibly no longer maintained.\ > > I also found md5deep which seems similar. > > Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?source: find . -type f -exec md5sum \{\} \; > checksum.list destination: md5sum -c checksum.list -- LF
On Sat, 28 Oct 2017 00:47:32 +0200 Leon Fauster wrote:> source: > > find . -type f -exec md5sum \{\} \; > checksum.list > > destination: > > md5sum -c checksum.listWouldn't diff be faster because it doesn't have to read to the end of every file and it isn't really calculating anything? Or am I looking at this in the wrong way. -- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com
Hi, On Fri, Oct 27, 2017 at 05:27:22PM -0400, H wrote:> What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.[snip]> Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?rsync obviously offers the 'exist in only one place' feature but also offers checksum comparisons (in version 3 and higher, I understand)... -c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file?s size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly. The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender?s file: files with either a changed size or a changed checksum are selected for transfer. Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is trans? ferred, but that automatic after-the-transfer verification has nothing to do with this option?s before-the-transfer "Does this file need to be updated?" check. For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4. Rich.
On October 28, 2017 8:10:34 AM EDT, Rich <centos at foxengines.net> wrote:>Hi, > >On Fri, Oct 27, 2017 at 05:27:22PM -0400, H wrote: >> What is the best tool to compare file hashes in two different >drives/directories such as after copying a large number of files from >one drive to another? I used cp -au to copy directories, not rsync, >since it is between local disks. >[snip] >> Are there other tools for this automatic compare where I am really >looking for a list of files that exist in only one place or where >checksums do not match? > >rsync obviously offers the 'exist in only one place' feature but also >offers checksum comparisons (in version 3 and higher, I understand)... > >-c, --checksum > This changes the way rsync checks if the files have been changed > and are in need of a transfer. Without this option, rsync uses > a "quick check" that (by default) checks if each file?s size and > time of last modification match between the sender and receiver. > This option changes this to compare a 128-bit checksum for each > file that has a matching size. Generating the checksums means > that both sides will expend a lot of disk I/O reading all the > data in the files in the transfer (and this is prior to any > reading that will be done to transfer changed files), so this > can slow things down significantly. > > The sending side generates its checksums while it is doing the > file-system scan that builds the list of the available files. > The receiver generates its checksums when it is scanning for > changed files, and will checksum any file that has the same size > as the corresponding sender?s file: files with either a changed > size or a changed checksum are selected for transfer. > > Note that rsync always verifies that each transferred file was > correctly reconstructed on the receiving side by checking a > whole-file checksum that is generated as the file is trans? > ferred, but that automatic after-the-transfer verification has > nothing to do with this option?s before-the-transfer "Does this > file need to be updated?" check. > > For protocol 30 and beyond (first supported in 3.0.0), the > checksum used is MD5. For older protocols, the checksum used is > MD4. > > >Rich. >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centosThank you, this time I used diff.
On October 27, 2017 6:47:32 PM EDT, Leon Fauster <leonfauster at googlemail.com> wrote:> >> Am 27.10.2017 um 23:27 schrieb H <agents at meddatainc.com>: >> >> What is the best tool to compare file hashes in two different >drives/directories such as after copying a large number of files from >one drive to another? I used cp -au to copy directories, not rsync, >since it is between local disks. >> >> I found a mention of hashdeep on the 'net which means first running >it against the first directory generating a file with checksums and >then running it a second time against the second directory using this >checksum file. Hashdeep, however, is not in the CentOS repository and, >according to the 'net, is possibly no longer maintained.\ >> >> I also found md5deep which seems similar. >> >> Are there other tools for this automatic compare where I am really >looking for a list of files that exist in only one place or where >checksums do not match? > >source: > >find . -type f -exec md5sum \{\} \; > checksum.list > >destination: > >md5sum -c checksum.list > >-- >LF > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centosThank you, saving this for the future.
On October 27, 2017 5:35:59 PM EDT, Frank Cox <theatre at sasktel.net> wrote:>On Fri, 27 Oct 2017 17:27:22 -0400 >H wrote: > >> What is the best tool to compare file hashes in two different >> drives/directories such as after copying a large number of files from >one >> drive to another? I used cp -au to copy directories, not rsync, since >it is >> between local disks. > >diff --brief -r dir1/ dir2/ > >might do what you need. > >If you also want to see differences for files that may not exist in >either directory: > >diff --brief -Nr dir1/ dir2/ > >-- >MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centosGreat, used as suggested!
On 10/27/2017 05:27 PM, H wrote:> What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.I typically use 'rsync -av -c --dry-run ${dir1}/ ${dir2}/ ' (or some variation) for this.? rsync works just as well on local disks as remote.? This isn't as strong of a comparison as even an md5, but it's not a bad one and gives you a quick compare. You can even use git for this: 'git diff --no-index ${dir1}/ ${dir2}/' and that would be a stronger comparison.