Hi, Apparently a change of behaviour from rsync 2.5 to rsync 2.6 affected the way I worked. I provide RPM repositories that I mirror using rsync. It is important to have the repository meta-data in sync with the data otherwise people have errors using Yum or Apt. In the old days (with older rsyncs) I was able to influence the order in which my transaction set was processed by changing the order of directories I wanted rsync to mirror, so that the metadata was uploaded after most of the data and deletes were done at the end of the transaction. With newer rsyncs, rsync seems to sort the transaction set, so I cannot longer use this trick to have the metadata uploaded just after the data in the same transaction set. I was wondering if it was possible and acceptable to have an rsync option to update the whole transaction in a atomic (or near-atomic way). This will also prevent the current problems when a mirror is rsyncing another mirror that is rsyncing itself. Since I have little bandwidth to update than most other mirrors, I'm often caught in this secnario. There are other ways to work around it, either by uploading in different steps (which is impractical in my scenario) or by using a staging area (which is impossible and impractical for large mirror sites). An option to atomically sync a transaction set would be a god gift for situations like these and probably (if not too much overhead) a behaviour most repository mirrors would want by default. -- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [all I want is a warm bed and a kind word and unlimited power]
On Mon, 3 Jan 2005 17:39:19 +0100 (CET), Dag Wieers dag-at-wieers.com wrote:> Apparently a change of behaviour from rsync 2.5 to rsync 2.6 affected the > way I worked. I provide RPM repositories that I mirror using rsync. It is > important to have the repository meta-data in sync with the data otherwise > people have errors using Yum or Apt.> In the old days (with older rsyncs) I was able to influence the order in > which my transaction set was processed by changing the order of > directories I wanted rsync to mirror, so that the metadata was uploaded > after most of the data and deletes were done at the end of the > transaction.Dag: I love your RPM collection, it's unfortunate that a minor change in rsync behavior has created problems for you. The design of UNIX file operations makes it hard to create an atomic transaction the size of a normal rsync transfer.> There are other ways to work around it, either by uploading in different > steps (which is impractical in my scenario) or by using a staging area > (which is impossible and impractical for large mirror sites).Rsync does a good job of ensuring file-level coherence by using a temporary file during the transfer and a quick rename to the original at the end. Unfortunately for you, this is only good for a single file. If this were done on a larger scale, it would serve as an atomic transaction-- but then rsync is just using a staging area of its own creation. The same thing could be accomplished by manually creating the staging area and only using rsync as the data transport. (Which is really what it's designed for.) I don't see how uploading in different steps would be impractical. The most bulletproof way to do this would be to sync each rpm and header file in one rsync session. However, for your collection of thousands of file pairs, this would indeed be impractical. Breaking it up into 10-20 sessions with several dozen file pairs each would be practical and could be automated with some shell or Perl wizardry. Another option you didn't mention would be to make use of LVM snapshots to ensure that your repository is always internally consistent even while you're in the middle of an rsync. The disadvantage would be some periodic unavailability while you removed and re-created your snapshots. (i.e. the FTP server is configured to serve files from the read-only snap volumes, which need to be unmounted and re-snapped when new files are uploaded.) There may be some full site-replication applications out there making use of rsync. I suspect someone here on the list would know. I've always just created my own custom scripts for this. Lastly, I suspect one of the rsync gurus here can probably comment on the feasibility of at least providing an option to restore the version 2.5 behavior. Thanks again for the RPMs. I hope you can find a good solution to your mirroring dilemma. -- Steve
On Mon, Jan 03, 2005 at 05:39:19PM +0100, Dag Wieers wrote:> With newer rsyncs, rsync seems to sort the transaction set, so I > cannot longer use this trick to have the metadata uploaded just after > the data in the same transaction set.Rsync has always sorted the list of files to be sent, so this is not something that is different between 2.5.x and 2.6.x. I'd be interested in hearing what you believe to be different in how files are processed.> I was wondering if it was possible and acceptable to have an rsync > option to update the whole transaction in a atomic (or near-atomic > way).One way to do this would be to use the --link-dest option to create a new hierarchy of files (with only the changed files getting sent, and all unchanged files being hard-linked to the prior files) and then moving the whole file-set into place all at once. Imagine that there is a hierarchy you want to update in /dest/cur by running this script: [ -d /dest/old ] && mv /dest/old /dest/new rsync -av --link-dest=/dest/cur host:/src/ /dest/new && mv /dest/cur /dest/old && mv /dest/new /dest This sequence allows any currently-running rsync processes to finish their grabbing of files because they're still running through the files that got put into /dest/old. As long as you don't do this rotation very rapidly (i.e. the time between two updates must be longer than the runtime of your longest-running downloader), it will all work fine. The above also works right if rsync errors-out during a transfer (i.e., it continues to update the "new" dir, and then moves it into place). For a push, if you're using a remote-shell connection, you cold create a script that does the same thing as the above and run it remotely via rsync. First, the script (which I've named "bin/daily-sync" in the receiving user's home dir): #!/bin/sh [ -d /dest/old ] && mv /dest/old /dest/new rsync "$@" && mv /dest/cur /dest/old && mv /dest/new /dest Then, run this command from the pushing system: rsync -av --rsync-path=bin/daily-sync --link-dest=/dest/cur /src/ host:/dest/new The only difficult case where I don't have a good solution is for pushing files via daemon mode. ..wayne..