On Thursday, June 09, 2016 05:18:03 PM Alessandro Baggi
wrote:> Thank you for your reply and sorry for late.
>
> My needs is only get a copy of large dataset a make sure that it is not
> broken after transfer. After transfer, this data will be stored on local
> backup server where there is bacula installation.
>
> For file transfer, to save time and bandwidth I will use rsync but I
> don't know how to check if those file will be corrupted.
>
> How I can perform this check?
> I can make an md5 for each file but for a great number of file this can
> be a problem.
Is there any chance you could switch to running ZFS or maybe BTRFS? They are
ridiculously more efficient at sending reliable, incremental updates to a file
system over a low bandwidth link and the load of rsync "at scale" can
be
enormous.
In our case, with about half a billion files, doing rsync over a local Gb LAN
took well over 24 hours - simply doing the discovery stage of rsync was nearly
all the overhead due to IOPs limitations.
Switching our primary backup method to using ZFS and send/receive of
incremental snapshots cut the time to backup/replicate to under 30 minutes,
with no significant change in server load.
And don't let the name "incremental snapshots" fool you - the end
result is
identical to doing a full backup / copy, with all files being verified as binary
perfect as of the moment the snapshot was made.
Really, if you can do this and you care about your data, you want to do this,
even if you don't know it yet. The learning curve is significant but the
results are well worth it.