Hi everybody! Now that RSYNC has RSYNC+ included a good usage would be to use RSYNC+ to gather update-date, then multicast that on your hosts and process it. So my question is: does anyone know of a product which does reliable multicasting? (source available would be preferred) Simple pointers are appreciated; if noone has one I'm thinking about writing one myself. Thanks for all help! Phil - This message is RSA-encrypted: n=33389, e=257
On Wednesday 19 December 2001 14:11, Ph. Marek wrote:> > Now that RSYNC has RSYNC+ included a good usage would be to use RSYNC+ to > gather update-date, then multicast that on your hosts and process it.First of all rsync+ was around for quite a long time (2 years at least). Second thing is that if you are not going to use multicasting only in your own AS it will be extreme administration burden to set-up infrastructure.> So my question is: does anyone know of a product which does reliable > multicasting? (source available would be preferred) > > Simple pointers are appreciated; if noone has one I'm thinking about > writing one myself.Please have a look at the YOID (http://www.aciri.org/yoid/). No software, only white papers. If this kind of distribution is of interest to you, I'd be glad to discuss this with you. Alexei.> Thanks for all help! > > > Phil > > > - > This message is RSA-encrypted: n=33389, e=257
>So my question is: does anyone know of a product which does reliable >multicasting? (source available would be preferred)At our company, we have a mrsync running for a couple of months now. mrsync is to transfer files to many machines at the same time using UDP and multicast. I attached at the end of this message the excerpt from the docs of mrync. If this is what you need, we can contribute this program to the openSource. HP +-----------------+ | MRSYNC vs RSYNC | +-----------------+ mrsync is a utility that transfers a bunch of files from a master machine to multiple target machines simultaneously by using the multicasting capability in the UNIX system. The name 'mrsync' is inspired by the popular utility 'rsync' for synchronizing files between two machines. However, beyond this similarity in the functionality, mrsync is fundamentally different from rsync in two areas. (1) rsync uses TCP while mrsync needs UDP in order to use the multicasting part of UNIX's socket communication. The former limits the data commuinication to one-to-one-machine whereas the latter allows one-to-many. UDP has no built in flow control. As a result, the major part of mrsync (more precisely, the multicaster and multicatcher), is devoted to synchronizing the data flow. (2) For a given file, rsync transfers (optionally) only those parts in the file that are different in the two versions on the master and the target machine. This saves time and is accomplished by using a rolling checksum algorithm by Andrew Trigell. mrsync, in contrast, transfers the whole content of a file to all targets in one time. +-------------------+ | HISTORY OF MRSYNC | + ------------------+ The project of mrsync stemmed from the prospective necessity to transfer many files to hundreds of machines running Linux at Renaissance Technologies Corp. Looking into the Open Source Community, we found a preliminary utility codes of multicasting written by Aaron Hillegass. Many unsuccessful test-runs on a huge amount of data files, however, led us to embark on an overhaul on the codes. Most of the following items were inherited and bug-fixed from the original codes. * The low level functions that interact with UNIX's multicasting sockets. * Meta_data -- the essential info about a file which the master machine will first transmit to the target machines. * Division of a file into many 'pages'. * The idea of maintaining a missing page flag. * The idea of a multicaster and multicatcher loop -- In this mrsync, we develop two new critical elements: flow-control message communication conducted by the multicaster, and a four-state page reader (processor) in the multicatcher. The former is to synchronize the task each machine is performing. For example, the master will not start sending the pages of a file unless all machines have acknowledged the completion of openning the disk i/o for the file. In order to accomodate these elements, the codes have been changed significantly from the original version. For example, the multicatcher now never asks for slowing down. And multicaster sends data on a file-by-file basis. The file integrity is achieved by orchestrating the data flow which is closely monitored and conducted by the master machine. As of today, mrsync has been in full use at Renaissance on a daily basis. +----------------------+ | TYPICAL RUNNING TIME | +----------------------+ 25 minutes for a group of files whose total size amounts to 4.6Gb. (This data is obtained from running on 5 SUN machines with Solaris 8 on an Ethernet LAN whose bandwidth is 1Gbits/sec.)
On Wed, Dec 19, 2001 at 12:11:46PM +0100, Ph. Marek wrote:> Now that RSYNC has RSYNC+ included a good usage would be to use RSYNC+ to > gather update-date, then multicast that on your hosts and process it.Note that the rsync+ functionality as currently present in CVS does not work; I sent a final patch to Martin Pool for review that makes it work properly (my colleague says he's had no problems so far). Hopefully it will make it into the next stable release. mrsync looks very interesting, too. It's always nice to have more options to choose from. -- Jos Backus _/ _/_/_/ Santa Clara, CA _/ _/ _/ _/ _/_/_/ _/ _/ _/ _/ josb@cncdsl.com _/_/ _/_/_/ use Std::Disclaimer;
>> But, as this send whole files, I'm a little bit irritated about the name >> "mrsync" - wouldn't mrcp or mtftp be better? >Or "mrdist".I agree that mrcp (or mrdist) is perhaps a name that more adequately captures the functionality of this code at this moment. If this code is really useful in general, and it depends on where the code might be going, I will wait for the next major revision to use a more appropriate name.>Maybe they hoped to put in the rsync algorithm into >eventually, but it seems to me that it would be very difficult to robustly >implement the rsync algorithm over multicast, because each receiver could >conceivably require different blocks to be sent.Yes, this is indeed not easy.> In practice, however, >this will be mostly be used for mirroring so all the receivers should >normally have the same files to begin with, and maybe they could optimize >it for that case and just have reduced performance for the unusual case.This is one direction this code can be going, at least at our site. (Our machines, except the master machine, are supposed to be in the same state all the time!) So, incorporating the rsync algorithm will improve the performance.> but I think there'd need to be some way of detecting and >supporting the receivers that aren't identical.This will be a good problem to think about!>> But otherwise this could be exactly what I need. >> Please let me know as soon as this is available somewhere as GPL or BSD or >> similar license!As soon as our manager clarifies certain copyright issues, I can put the codes somewhere. He just informed me that we are looking at around next week or Jan 1. Happy holidays, HP