Hergaarden, Marcel
2004-Mar-22 12:42 UTC
Long time needed for "Building file list" Any suggestions ?
We're running rsync 2.5.7 on a Windows2000 server, in combination with cygwin/ssh. The server who receives the data is a Linux server. The amount of data from the Windows server is about 100 Gb. Represented by 532.000 files of different nature. Mostly doc, ppt and xls files. It takes about 2 hours to create only the file list. Is the amount of data/files to big, should I segmentate the backupfiles or is something else the cause of this long duration. Lookin forward to your answers. Marcel Hergaarden
Clint Byrum
2004-Mar-22 17:34 UTC
Long time needed for "Building file list" Any suggestions ?
On Mon, 2004-03-22 at 04:42, Hergaarden, Marcel wrote:> We're running rsync 2.5.7 on a Windows2000 server, in combination with > cygwin/ssh. The server who receives the data is a Linux server. > > The amount of data from the Windows server is about 100 Gb. Represented > by 532.000 files of different nature. Mostly doc, ppt and xls files. > > It takes about 2 hours to create only the file list. > > Is the amount of data/files to big, should I segmentate the backupfiles > or is something else the cause of this long duration. >2 hours for 530,000+ files sounds about right to me. 2 things.. 1) how much does the data change per day? 2) how fast is the network link between the two boxes? We had a situtation recently where a backup via rsync that used to take 1 hour total, suddenly ballooned to 3.5 hours. This wasn't acceptable as it was loading the server down. We had recently upgraded to gigabit ethernet, so we were a bit perplexed. Then we realized that the number of files being rsynced had gone up by a factor of 5. We switched to just doing a simple tar backup of all the files. It only takes an hour again. rsync is great (REALLY GREAT), but remember, its mostly about maximizing bandwidth by only sending whats changed. If 80% of your data changes every day, and you have a gigabit link... rsync isn't really meant for that anyway. This does bring up one point though. Is there any way to optimize file list building? It seems like that turns into a huge bottleneck in the "lots of files" situation.
Mark Thornton
2004-Mar-22 21:45 UTC
Long time needed for "Building file list" Any suggestions ?
Clint Byrum wrote:>This does bring up one point though. Is there any way to optimize file >list building? It seems like that turns into a huge bottleneck in the >"lots of files" situation. > >Only by having a process which continuously monitors the relevant directory trees to maintain a list of the changed files. In the case of Windows perhaps using the ReadDirectoryChangesW method. http://msdn.microsoft.com/library/en-us/fileio/base/readdirectorychangesw.asp?frame=true Otherwise scanning a large directory tree for changes is unavoidably slow. (At least it is only quick if you have very recently scanned the same tree and thus have all the directory data cached in memory.) Mark Thornton
Robert Sander
2004-Jul-07 13:01 UTC
Long time needed for "Building file list" Any suggestions ?
On Mon, 22 Mar 2004 12:42:58 +0000 (UTC), "Hergaarden, Marcel" <Marcel.Hergaarden@Getronics.com> wrote:> It takes about 2 hours to create only the file list.Hi! We have a similar problem as we are running dirvish to produce nightly live backups. The file list building process takes too long. I have noticed that building the file list is not done in parallel on the remote host and the local host. Wouldn't that be an option to speed up the process? Build the file list in parallel on the source side and the destination side (there are already two rsync processes running anyway) and compare the outcome when both have finished. This could easily reduce the time needed before actually syncing by a factor of 2. Are there any ideas in this direction? Greetings -- Robert Sander Manager Epigenomics AG Information Systems www.epigenomics.com Kastanienallee 24 +493024345330 10435 Berlin BOFH excuse #117: the printer thinks its a router.