Hi, I've got identical servers. One is primary the other is backup receiving rsyncs from the primary. I'm backing up a file system to disk and the files are small and there are lots of directories. The overall problem seems to be the total number of files. When I had ~375,000 files, the total rsync time was under a minute. With ~425,000 files, the total rsync time is 10 minutes. Last Friday when we were at 425,000 files, the rsync time was 10 minutes. Today I was able to delete 50,000 unneeded files and the rsync time went back down to under a minute. So why the huge change in total rsync time for a somewhat small change in total number of files? I'm afraid that as the total number of files keeps increasing that the total rsync time is going to go exponential. I turn the --progress flag on, and the time is rougly divided up evenly between building the file list and looking thru the file list. The files themselves are really small (~16K) and I'm not seeing any problem with anything other than how long it takes rsync to make a pass thru all the files. I do use the --delete option. The servers are Dell 2950s, builtin RAID 10 disks and 4Gig of RAM. OS is Centos 5.1. I'm running rsync 2.6.8 protocol version 29. This smells to me like some sort of caching problem. Is there something in the kernel or rsync itself that I can tweek? Thanks, Mike -------------- next part -------------- HTML attachment scrubbed and removed
Mike Connell wrote:> Hi, >Hi Mike,> I've got identical servers. One is primary the other is backup > receiving rsyncs from the primary. I'm backing up a file system to > disk and the files are small and there are lots of directories. > > The overall problem seems to be the total number of files. > When I had ~375,000 files, the total rsync time was under a minute. > With ~425,000 files, the total rsync time is 10 minutes. > > Last Friday when we were at 425,000 files, the rsync time was 10 minutes. > Today I was able to delete 50,000 unneeded files and the rsync time went > back down to under a minute. > > So why the huge change in total rsync time for a somewhat small change > in total number of files? I'm afraid that as the total number of files keeps > increasing that the total rsync time is going to go exponential. > > I turn the --progress flag on, and the time is rougly divided up evenly > between > building the file list and looking thru the file list. The files themselves > are really small (~16K) and I'm not seeing any problem with anything > other than how long it takes rsync to make a pass thru all the files. I > do use > the --delete option. > > The servers are Dell 2950s, builtin RAID 10 disks and 4Gig of RAM. > OS is Centos 5.1. I'm running rsync 2.6.8 protocol version 29. > > This smells to me like some sort of caching problem. Is there something > in the kernel or rsync itself that I can tweek? >I'm no expert, but I suggest using rsync 3.x (3.0.6 for example), it doesn't keep the as much information of the filelist in memory. It's probably swapping to disk, because of the large list and that significantly slows down the performance of the whole machine(s). Have a look at the output of the 'vmstat 2' command on both machines while it's busy, specifically look at the caption that says 'swap', it has a 'si' and 'so' column below it. 'si' means reading from swap/disk and 'so' means writing to swap/disk. You can try it out fairly easily, especially if you don't use rsync for anything else. If you can't find a package, just building it is possible an option: cd /usr/src wget http://rsync.samba.org/ftp/rsync/rsync-3.0.6.tar.gz tar -zxvf rsync-3.0.6.tar.gz nice ./configure && nice make That should work (atleast if you have gcc and make and possible other things already installed). And instead of calling rsync, you call /usr/src/rsync-3.0.6/rsync if you just want te test it first without installing. You'll have to do it on both machines ofcourse. If you are not sure you want to make any changes, with an unsupported binary, you can use: -n that would make rsync not write changes to disk. Hope these instructions help.> Thanks, > > Mike
Mike Connell wrote:> Hi, >Hi again Mike,> I don't see how to reply to your post so it shows up as a reply > on the list. So I guess I'll just send email directly to you. >You just e-mail rsync@lists.samba.org instead of me. :-)> Today I've been watching the production 2.6.8 rsync off and on and no it > isn't swapping. Used "vmstat" and "top" both on the source and > the destination. Each shows 0 for si and so. > > With iostat -xn 5, I do see that first disk utilization on the source > hits 95% while the file list is being received. After rsync says done > (with the file list), then the destination hits 95% disk utilization. > > This is not good. As it takes more and more time, it will be pegging > our servers. >Maybe reading all the files from disk fills up the 'file-caching' (the memory used to prevent reading from disk). If I do some quick calculation (and my math isn't wrong), then 375.000 * 16k almost fits in 4GB of memory and 425.000 definitly does not. Thus it needs to go back to disk and read from there.> So I used your good advice and downloaded and built rsync 3.0.6. > (Couldn't find any packages available). > > I now see that the new rsync says "receiving incremental file list". > What does thisIt just means it's supposed to use less memory, because it doesn't need to keep the whole filelist in memory. Although I suspect other tradeoffs might be made.> do? Sounds good. Have only verified that the new rsync seems to work > in a test capacity. Will move it into production soon to see how it does. >If what you mentioned is how I think it is, then I doubt it will help much or maybe just for a while. I don't know what kernel you have (and io-scheduler you are using), but I do know their is also a 'ionice' command (in Debian-based distributions it's part of the util-linux-package) which can be prepended to running the rsync command which is meant to set priorities between processes for reading and writing to/from disk. It will possible slow down rsync even more, but atleast it wouldn't slow down the other processes on the server. It will still kill the file-cache though, so in that way it could still slow down other processes. If it's the file-cache plugging in extra memory would solve the problem for a certain while. I don't know if you'd need to be running 64-bit for that though (depends on the machine and CentOS). If it's failover setup and it turns out that their is no easy solution with rsync, maybe something at the block device level would be more appropriate like: http://www.drbd.org/ or http://www.centos.org/docs/5/html/5.1/Global_Network_Block_Device/ch-gnbd.html But it's not something I've used before.> Thanks, > > Mike >
Hi, Here is an update. I haven't deployed a new version of rsync into production. Instead I split my current rsync up into 10 independent sub directories of the main directory. I run them serially one after the other. I'm up to 404,000 files and the total sync time doesn't seem to be falling off a cliff (yet). In my case, only about .1% of my files change, so I'm sure it isn't a rsync memory issue. But I strongly suspect with the results I'm getting so far, that it is a matter of how many directories and inodes can be kept cached in memory. The largest of the 10 sub directory rsyncs is about 75,000 files. So this would seem to put less pressure on this cache. Thanks, Mike