thr3ads.net - rsync - Optimizations and other questions for rsync [Oct 2002]

If this information is useful, please help other people find it:
Share via:

Farid Shenassa

2002-Oct-16 21:00 UTC

Optimizations and other questions for rsync

Hello Everyone,

I've just started using rsync to copy files from  Windows NT RCS library to
Stratus VOS (Posix like fauilt tolerant Mini system) as a shadow.    I would
also like to setup rsync to copy log or other process output files from VOS
to an NT system.   Some questions if anyone here can help:

1. is there any computational or disk IO difference between the rsync client
and server (the one that does just the checksum on the block, vs the one
that does rolling checksum).  Given that I do not have as much cpu on the
VOS machine, I would like the more expensive side to run on the Windows
system.  So I need to figure out who should run the daemon and who should
push vs. pull.

2. is there a way for rsync to cache previous calculations on checksum, or
be told that a particular file of  regex filename starname is always
appended to, so it does not read the entire file?  Basically I have
processes that constantly append to ann output file on VOS.  I would like to
mirror these onto the NT machine.     However, I do not want to have rsync
every few minutes read the entire file.   Choices I see are:
	a. tell rsync that the file is append mode, so it just picks up from
the last block size on the other machine and goes forward
	b. rsync is smart enough not to do this on its own
	c. rsync can store cached checksum information
	d. there is another option that tells rsync to do this that I
missed.
	e. there is an option to tell rsync to basically continue to read
the file every X interval after it gets to the end without exiting.

3. expanding on option 2e.  One possibility would be to run rsync for each
file being synced and telling it to just sync to the end, then stay in
memory, and look for file changes or try to read more blocks at the end
(assuming another process is writing to it), and sync those new blocks.
This would keep rsync from stopping and having to restart from the
beginning.  It may however, cause memory issues for large files if it keeps
the whole checksum in memory?

Any ideas or other ways to get around this?   Again, question 2/3 are for
basically syncing open log files to another machine efficiently.  There may
be another tool out there for this that I'm not aware of, if so, please
enlighten me so I can stay away.

Thanks in advance for any help.
-------------- next part --------------
HTML attachment scrubbed and removed

jw schultz

2002-Oct-16 23:58 UTC

head link

Optimizations and other questions for rsync

On Wed, Oct 16, 2002 at 05:00:02PM -0400, Farid Shenassa
wrote:> 1. is there any computational or disk IO difference between the rsync
client
> and server (the one that does just the checksum on the block, vs the one
> that does rolling checksum).  Given that I do not have as much cpu on the
> VOS machine, I would like the more expensive side to run on the Windows
> system.  So I need to figure out who should run the daemon and who should
> push vs. pull.
Once the connection is established it doesn't matter whether
you push or pull.  The difference is who sends and who
receives.  The receiver bears the brunt of the work.
> 2. is there a way for rsync to cache previous calculations on checksum, or
> be told that a particular file of  regex filename starname is always
> appended to, so it does not read the entire file?  Basically I have
> processes that constantly append to ann output file on VOS.  I would like
to
> mirror these onto the NT machine.     However, I do not want to have rsync
> every few minutes read the entire file.   Choices I see are:
> 	a. tell rsync that the file is append mode, so it just picks up from
> the last block size on the other machine and goes forward
> 	b. rsync is smart enough not to do this on its own
> 	c. rsync can store cached checksum information
> 	d. there is another option that tells rsync to do this that I
> missed.
> 	e. there is an option to tell rsync to basically continue to read
> the file every X interval after it gets to the end without exiting.
There is no way (with rsync) to do any of these things.
> 3. expanding on option 2e.  One possibility would be to run rsync for each
> file being synced and telling it to just sync to the end, then stay in
> memory, and look for file changes or try to read more blocks at the end
> (assuming another process is writing to it), and sync those new blocks.
> This would keep rsync from stopping and having to restart from the
> beginning.  It may however, cause memory issues for large files if it keeps
> the whole checksum in memory?
> 
> Any ideas or other ways to get around this?   Again, question 2/3 are for
> basically syncing open log files to another machine efficiently.  There may
> be another tool out there for this that I'm not aware of, if so, please
> enlighten me so I can stay away.
It sounds like latency is your issue, not efficiency.  Your
description indicates that you want the log file copies to
be kept within a few minutes of the originals.  If you were
talking about UNIX or Linux i'd suggest looking into syslogd
first and then clustering software or distributed
filesystems.

What would work would be a special utility that monitors the
log files and detects appending and then transmits the
appended data to a (remote) daemon that updates a copy.
Your utility and daemon would have to know what to do about
file rotation.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

Craig Barratt

2002-Oct-20 22:32 UTC

head link

Optimizations and other questions for rsync

> 2. is there a way for rsync to cache previous calculations on checksum...
Rsync doesn't do this, but it is possible.

There is a checksumSeed (typically unix time()) supplied by the server
when rsync starts.  It is different for every run (at least those
started more than 1 second apart).  But since it is appended to the
end of each block it is possible to cache the block checksums (really
the 128 bit MD4 state, prior to MD4_tail) without the checksumSeed,
and simply complete the calculation by adding the checksumSeed and
calling MD4_tail.

However, the entire-file MD4 checksum has the checksumSeed added at
the start (that's a good place to put it to reduce the chance of
MD4 collisions over consecutive runs, but unfortunate for caching).
So you cannot cache the file MD4 checksum: there is no easy way to
compute MD4(checksumSeed, file) even if you know MD4(file).

When checksumSeed == 0 then checksumSeed is not included in MD4 calculations.
So if we added a command-line switch --checksumseed=0 that overrides the
default then all the block and file checksums would be cacheable.

Craig

Maybe Matching Threads

Search for more reasonably related threads

rsync - Oct 2002 - Optimizations and other questions for rsync

Optimizations and other questions for rsync

Optimizations and other questions for rsync

Optimizations and other questions for rsync

Maybe Matching Threads