Howdy, Rsync has been churning away for 45 mins, presumably bulding an in-core list of files to be copied to the destination. This is a very very large filesystem we are copying locally - approximately 4.2million files (WebCT). The resident process size for rsync has grown to 72Mb. - is this normal behaviour for a file system this size, and does rsync have the ability to handle such a large number of files? The filesystem size is relatively modest - 20gb. or so, but the millions of small files might explain while its taking so long. I presume that once rsync has built it's in-memory list of files it proceeds to copy the whole shebang over. This is the initial copy, subsequent copies should be much faster. Also, are there any options suitable for turning off that would speed up the whole process? We're using (as root) # rsync -a --progress --stats /global/webct/ /target Solaris 8 and rsync 2.5.7 Thanks Scotty
On Wed, Dec 17, 2003 at 05:12:31PM -0500, Steve Howie wrote:> Howdy, > > Rsync has been churning away for 45 mins, presumably bulding an in-core > list of files to be copied to the destination. This is a very very large > filesystem we are copying locally - approximately 4.2million files > (WebCT). The resident process size for rsync has grown to 72Mb. - is > this normal behaviour for a file system this size, and does rsync have > the ability to handle such a large number of files? The filesystem size > is relatively modest - 20gb. or so, but the millions of small files > might explain while its taking so long.You have sussed it. Rsync is building a complete in-memory file list with all metadata. This takes approximately 100 bytes per path (directories too) 4.2 million files is going to take over 400MB. You might want to break the job up.> I presume that once rsync has built it's in-memory list of files it > proceeds to copy the whole shebang over. This is the initial copy, > subsequent copies should be much faster.Building the file list takes the same amount of time every time. The speedup in subsequent syncs will be in the selective data transfers.> Also, are there any options suitable for turning off that would speed up > the whole process? We're using (as root) > > # rsync -a --progress --stats /global/webct/ /targetIs /global/webct by any chance served by NFS? That could account for it taking so much time just walking the file list. Whenever possible it is best to avoid rsyncing over NFS.> > Solaris 8 and rsync 2.5.7To be honest i'd use cp -a or cpio for at least the initial transfer. Rsync does work for local copy but is actually slower than other methods. It is over the network (the R in rsync) that it shines. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
Scotty As far as I know rsync build a list of files and attributes + checksums in memory. This can cause a large memory footprint while running the process, so 72MB of data for 4.2 million files is only about right, depending on the cheksum options, etc. I believe that average overhead is about 120 bytes per file... someone correct me if I am wrong... Note that some options will consume more memory than other and some (like -c which asks rsync to chekcsum each file before sending ) will be very slow. As far as filesystems go Solaris 8 UFS is not the fastest of filesystems and traversing the directories may take some time expecially is there is other I/O happening on the FS. Tomasz On Wed, Dec 17, 2003 at 05:12:31PM -0500, Steve Howie wrote:> Howdy, > > Rsync has been churning away for 45 mins, presumably bulding an in-core > list of files to be copied to the destination. This is a very very large > filesystem we are copying locally - approximately 4.2million files > (WebCT). The resident process size for rsync has grown to 72Mb. - is > this normal behaviour for a file system this size, and does rsync have > the ability to handle such a large number of files? The filesystem size > is relatively modest - 20gb. or so, but the millions of small files > might explain while its taking so long. > > I presume that once rsync has built it's in-memory list of files it > proceeds to copy the whole shebang over. This is the initial copy, > subsequent copies should be much faster. > > Also, are there any options suitable for turning off that would speed up > the whole process? We're using (as root) > > # rsync -a --progress --stats /global/webct/ /target > > Solaris 8 and rsync 2.5.7 > > Thanks > > Scotty > > -- > To unsubscribe or change options: > http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html-- Tomasz M. Ciolek ******************************************************************************* email: tmc at dreamcraft dot com dot au ******************************************************************************* GPG Key ID: 0x41C4C2F0 Key available on www.pgp.net ******************************************************************************* Everything falls under the law of change; Like a dream, a phantom, a bubble, a shadow, like dew or flash of lightning. You should contemplate like this.
On Thu, Dec 18, 2003 at 09:34:45AM +1100, Tomasz Ciolek wrote:> Scotty > > As far as I know rsync build a list of files and attributes + checksums > in memory. This can cause a large memory footprint while running the > process, so 72MB of data for 4.2 million files is only about right, > depending on the cheksum options, etc. I believe that average overhead > is about 120 bytes per file... someone correct me if I am wrong...Depending on filename lengths it is about 100 bytes per file. 100B * 4.2M = 420MB but his 72MB is only rss. File checksums are only generated during file list build if -c has been specified. As you mention below. The file checksum would add 20 bytes per file to the footprint.> Note that some options will consume more memory than other > and some (like -c which asks rsync to chekcsum each file > before sending ) will be very slow. As far as filesystems > go Solaris 8 UFS is not the fastest of filesystems and > traversing the directories may take some time expecially > is there is other I/O happening on the FS.