Zachary Denison
2003-Aug-21 23:42 UTC
rsync problem and question about using rsync with Maildir
Hi, I have a Maildir store (about 500 GB) on a linux (redhat 8) server which I am trying to mirror to another identical server. I have 4 GB of ram on both machines. I am using rsync 2.5.5. At present the machines are on the same lan (100Mbit). Everytime I run the rsync, it runs for a short amount of time, (anywhere from 5-20 minutes) and then one of the machines (either the source or the destination machine) crashes and requires reboot. The actual command I am using is: /usr/bin/rsync -qaz --rsh=ssh --stats --progress --rsync-path=/usr/bin/rsync --delete --force /users/ 10.10.12.161:/users/ Examination of the systems "free" command during rsync execution shows that the rsync rapidly consumes the systems memory. Is this an appropriate use for rsync? My goal is to be able to first synchronize the maildirs on a 100mb lan. Then ship the destination machine to a remote location and then run periodic rsync backups over WAN (each site has regular 1.5mb connection to internet) to the remote destination server as a backup. Does this scenario sound feasible given that the users directories will contain ONLY Maildirs. Also since Maildirs contain a large number of files does it make sense to tar and/or gzip each users Maildir and rsync the tar files? Also is rsync over ssh contributing to my problem, does it make sense to run an rsync server instead? Thank you very much in advance for any hints you can give me. Zach. __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
jw schultz
2003-Aug-22 08:29 UTC
rsync problem and question about using rsync with Maildir
On Thu, Aug 21, 2003 at 06:42:17AM -0700, Zachary Denison wrote:> > Hi, > > I have a Maildir store (about 500 GB) on a linux > (redhat 8) server which I am trying to mirror to > another identical server. I have 4 GB of ram on both > machines. I am using rsync 2.5.5. At present the > machines are on the same lan (100Mbit). > > Everytime I run the rsync, it runs for a short amount > of time, (anywhere from 5-20 minutes) and then one of > the machines (either the source or the destination > machine) crashes and requires reboot.That the machines crash is a kernel bug. There may be changes to how you run rsync to avoid it but any crash is the kernel's fault. Since you are running RH, check their errata kernels. Having 4GB RAM you may be running out of zone normal and having a bounce-buffer failure. It may pay to boot with only 900M enabled.> The actual command I am using is: > > /usr/bin/rsync -qaz --rsh=ssh --stats --progress > --rsync-path=/usr/bin/rsync --delete --force /users/ > 10.10.12.161:/users/ > > Examination of the systems "free" command during rsync > execution shows that the rsync rapidly consumes the > systems memory.You describe this area as a Maildir store. If that is in maildir or mh format (as opposed to mbox) that means one file per message. Rsync's memory requirement grows linearly with file count. This memory consumption occurs during the "building file list" period prior to syncing files. If the crash is happening during that interval then reducing the size of the file list will probably avoid the problem. Reducing the file list size should be fairly easy to do. Just doing a separate rsync invocation for each user should break it up sufficiently.> Is this an appropriate use for rsync? My goal is toyes.> be able to first synchronize the maildirs on a 100mb > lan. Then ship the destination machine to a remote > location and then run periodic rsync backups over WAN > (each site has regular 1.5mb connection to internet) > to the remote destination server as a backup.Perfectly sensible.> Does this scenario sound feasible given that the users > directories will contain ONLY Maildirs.yes.> Also since Maildirs contain a large number of files > does it make sense to tar and/or gzip each users > Maildir and rsync the tar files?Not really. Least of all the gzip.> Also is rsync over ssh contributing to my problem,No.> does it make sense to run an rsync server instead?Not if there is confidential info.> Thank you very much in advance for any hints you can > give me.One thing i should mention. Maildir files are seldom, if ever modified. In fact unless you have a user interface that allows users to edit a message i don't think (i could be wrong) the files will ever be modified, only created, renamed and deleted. Even message user-level message editing normally creates a new message and deletes the old. Instead of modifying the files maildir uses file name and location to indicate status. New files sit in the new/ directory. What this means is that rsync isn't going to be the most efficient way to synchronise maildirs. When status changes occur rsync will only see that there is a new file and an old one is gone. It won't know that the file was renamed. A utility that examines filenames will probably be able to identify these changes and rename files instead of retransmitting them. It would not surprise me if such a utility already existed. 500GB of mail sounds sufficiently worth creating such a utility if someone has not already done so. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt