Hello, I routinely mirror databases with many directories, each containing 20,000 file or more. Many of the older directories rarely change and are identical on the mirror(s), but rsync still sends over the file list from all directories every time. This results in lists containing hundreds of thousands (or millions) of files being sent on every rsync, when only a few of the directories actually have changes. This is a serious problem on slower links. To avoid the above problem, a checksum could be done on the file list for each directory on the source and compared with the destination. If the checksum is identical, there should be no need to send the file list for a directory (except to list the sub-directories). If there already a way to do this in rsync, I would appreciate being pointed in the right direction. If this has already been discussed on the list, my apologies. Thanks, Peter -------------- next part -------------- HTML attachment scrubbed and removed
Hello,
I have followed the discussion of speeding up rsync when there are lots
of files, and I have a proposal which I think would greatly speed rsync
when doing routine mirroring of large filesystems.
One of the speed-limiting issues with rsync is having to send huge file
lists when mirroring large file systems, even for incremental updates
where only a small part of the file system might have changed. My
proposal is to first send a checksum of the file list for each
directory. If is found to be identical to the same checksum on the
remote side then the list need not be sent for that directory! That
would reduce the size of the file list greatly when there are
directories containing many files which do not change from on rsync to
the next.
Here's an example:
remote local
dir1 dir1 - file
list checksum same as on remote -> don't send file list for dir1
dir2 dir2 - file
list checksum same as on remote -> don't send file list for dir2
dir3 dir3 - file
list checksum different from remote -> send file list for dir3
It might even be possible to use the rsync checksum algorithm on the
directory lists themselves to determine which portion of the directory
lists to send, in the case of directories which nearly identical.
I would appreciate hearing from rsync developers if this feasible with
the current implementation and if they think it would help.
Thanks,
Peter Salameh