Yesterday, as I was still waiting for a large rsync mirror to finish, I
was thinking that it would be interesting if you could run multiple
rsyncs and have them cooperate to mirror a repository from several
different sources. I think a close approximation should be fairly
easy to do, but I just won't have any time to do it.
My thought is that it could be implemented fairly inexpensively by
mostly relying on the temporary files that are already written. If the
temp files were given a common extension (even if it were just common
among a concurrent set of rsyncs), the processes could use the temporary
files to determine which daemon works on which file.
In other words, get a truely random file name, open it, get it's inode
number, and then rename it to the common temp file name. If your inode
number no longer belongs to that file at any point during the transfer,
skip on to the next file. Of course, this is only if the temp file
doesn't already exist at the start of the attempt to rsync that file.
I'm not really sure of how rsync does the transfers, which could lead to
some stickiness. I know it builds up a list of changed files at the
very beginning, which it then works on. It's not clear if it only
exchanges "deltas" which are computed while the transfer is happening,
or if that's all computed up front. If it's the former, it may be
possible to just leave the transfer portion as it is, and have the
different rsync daemons skip files that are already in transit. If it's
all computed up front, I could imagine a mode where you store the inode
number of the old file and skip the transfer if the inode has changed
since the initial building of the file lists.
This is pretty light-weight, and there are certainly instances where it
could get things wrong. However, for my situation, it would get me
close enough, and then I could run a final rsync to catch the
stragglers.
It seems like this would be much less work to implement than anything
like using sockets/pipes/spread to communicate between the daemons,
setting up a master daemon, etc... Just thought it might be worth
sharing.
My usage is that the push-primary Debian mirror I was syncing against
went away a few weeks ago, and the new mirror I'm syncing against is
only giving me 60KB/sec. I wouldn't mind getting more than that,
particularly during low usage periods where I might be able to get much
more than that from a fast site. Yeah, I know I could change the mirror
I'm using, but there aren't many push-primaries. That's what gave
me
the idea.
Sean
--
I hear a cow jack-knifed on the Harley Memorial Bridge... There was milk
everywhere. -- Stephanie, _Newhart_
Sean Reifschneider, Member of Technical Staff <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin