Hello Everyone, I've just started using rsync to copy files from Windows NT RCS library to Stratus VOS (Posix like fauilt tolerant Mini system) as a shadow. I would also like to setup rsync to copy log or other process output files from VOS to an NT system. Some questions if anyone here can help: 1. is there any computational or disk IO difference between the rsync client and server (the one that does just the checksum on the block, vs the one that does rolling checksum). Given that I do not have as much cpu on the VOS machine, I would like the more expensive side to run on the Windows system. So I need to figure out who should run the daemon and who should push vs. pull. 2. is there a way for rsync to cache previous calculations on checksum, or be told that a particular file of regex filename starname is always appended to, so it does not read the entire file? Basically I have processes that constantly append to ann output file on VOS. I would like to mirror these onto the NT machine. However, I do not want to have rsync every few minutes read the entire file. Choices I see are: a. tell rsync that the file is append mode, so it just picks up from the last block size on the other machine and goes forward b. rsync is smart enough not to do this on its own c. rsync can store cached checksum information d. there is another option that tells rsync to do this that I missed. e. there is an option to tell rsync to basically continue to read the file every X interval after it gets to the end without exiting. 3. expanding on option 2e. One possibility would be to run rsync for each file being synced and telling it to just sync to the end, then stay in memory, and look for file changes or try to read more blocks at the end (assuming another process is writing to it), and sync those new blocks. This would keep rsync from stopping and having to restart from the beginning. It may however, cause memory issues for large files if it keeps the whole checksum in memory? Any ideas or other ways to get around this? Again, question 2/3 are for basically syncing open log files to another machine efficiently. There may be another tool out there for this that I'm not aware of, if so, please enlighten me so I can stay away. Thanks in advance for any help. -------------- next part -------------- HTML attachment scrubbed and removed
On Wed, Oct 16, 2002 at 05:00:02PM -0400, Farid Shenassa wrote:> 1. is there any computational or disk IO difference between the rsync client > and server (the one that does just the checksum on the block, vs the one > that does rolling checksum). Given that I do not have as much cpu on the > VOS machine, I would like the more expensive side to run on the Windows > system. So I need to figure out who should run the daemon and who should > push vs. pull.Once the connection is established it doesn't matter whether you push or pull. The difference is who sends and who receives. The receiver bears the brunt of the work.> 2. is there a way for rsync to cache previous calculations on checksum, or > be told that a particular file of regex filename starname is always > appended to, so it does not read the entire file? Basically I have > processes that constantly append to ann output file on VOS. I would like to > mirror these onto the NT machine. However, I do not want to have rsync > every few minutes read the entire file. Choices I see are: > a. tell rsync that the file is append mode, so it just picks up from > the last block size on the other machine and goes forward > b. rsync is smart enough not to do this on its own > c. rsync can store cached checksum information > d. there is another option that tells rsync to do this that I > missed. > e. there is an option to tell rsync to basically continue to read > the file every X interval after it gets to the end without exiting.There is no way (with rsync) to do any of these things.> 3. expanding on option 2e. One possibility would be to run rsync for each > file being synced and telling it to just sync to the end, then stay in > memory, and look for file changes or try to read more blocks at the end > (assuming another process is writing to it), and sync those new blocks. > This would keep rsync from stopping and having to restart from the > beginning. It may however, cause memory issues for large files if it keeps > the whole checksum in memory? > > Any ideas or other ways to get around this? Again, question 2/3 are for > basically syncing open log files to another machine efficiently. There may > be another tool out there for this that I'm not aware of, if so, please > enlighten me so I can stay away.It sounds like latency is your issue, not efficiency. Your description indicates that you want the log file copies to be kept within a few minutes of the originals. If you were talking about UNIX or Linux i'd suggest looking into syslogd first and then clustering software or distributed filesystems. What would work would be a special utility that monitors the log files and detects appending and then transmits the appended data to a (remote) daemon that updates a copy. Your utility and daemon would have to know what to do about file rotation. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
> 2. is there a way for rsync to cache previous calculations on checksum...Rsync doesn't do this, but it is possible. There is a checksumSeed (typically unix time()) supplied by the server when rsync starts. It is different for every run (at least those started more than 1 second apart). But since it is appended to the end of each block it is possible to cache the block checksums (really the 128 bit MD4 state, prior to MD4_tail) without the checksumSeed, and simply complete the calculation by adding the checksumSeed and calling MD4_tail. However, the entire-file MD4 checksum has the checksumSeed added at the start (that's a good place to put it to reduce the chance of MD4 collisions over consecutive runs, but unfortunate for caching). So you cannot cache the file MD4 checksum: there is no easy way to compute MD4(checksumSeed, file) even if you know MD4(file). When checksumSeed == 0 then checksumSeed is not included in MD4 calculations. So if we added a command-line switch --checksumseed=0 that overrides the default then all the block and file checksums would be cacheable. Craig