Hello, I routinely mirror databases with many directories, each containing 20,000 file or more. Many of the older directories rarely change and are identical on the mirror(s), but rsync still sends over the file list from all directories every time. This results in lists containing hundreds of thousands (or millions) of files being sent on every rsync, when only a few of the directories actually have changes. This is a serious problem on slower links. To avoid the above problem, a checksum could be done on the file list for each directory on the source and compared with the destination. If the checksum is identical, there should be no need to send the file list for a directory (except to list the sub-directories). If there already a way to do this in rsync, I would appreciate being pointed in the right direction. If this has already been discussed on the list, my apologies. Thanks, Peter -------------- next part -------------- HTML attachment scrubbed and removed
Hello, I have followed the discussion of speeding up rsync when there are lots of files, and I have a proposal which I think would greatly speed rsync when doing routine mirroring of large filesystems. One of the speed-limiting issues with rsync is having to send huge file lists when mirroring large file systems, even for incremental updates where only a small part of the file system might have changed. My proposal is to first send a checksum of the file list for each directory. If is found to be identical to the same checksum on the remote side then the list need not be sent for that directory! That would reduce the size of the file list greatly when there are directories containing many files which do not change from on rsync to the next. Here's an example: remote local dir1 dir1 - file list checksum same as on remote -> don't send file list for dir1 dir2 dir2 - file list checksum same as on remote -> don't send file list for dir2 dir3 dir3 - file list checksum different from remote -> send file list for dir3 It might even be possible to use the rsync checksum algorithm on the directory lists themselves to determine which portion of the directory lists to send, in the case of directories which nearly identical. I would appreciate hearing from rsync developers if this feasible with the current implementation and if they think it would help. Thanks, Peter Salameh