tim.conway@philips.com
2002-May-17 14:00 UTC
Improving the rsync protocol (RE: Rsync dies)
Wayne: If anybody can make that work, I'd bet you could. The basic rsync algorythm is in place, so as you say, it would mostly be a matter of list generation. You'd have to hold on to any files with <1 link, in a seperate list, to find all the linkage relationships, which could grow a bit, but it does sound more efficient. Maybe a third pipe to send the files over as you go? Mind: I'm not offering to help. It's too complicated for my tiny mind. Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?" Wayne Davison <wayned@users.sourceforge.net> Sent by: rsync-admin@lists.samba.org 05/17/2002 02:42 PM To: rsync users <rsync@lists.samba.org> cc: (bcc: Tim Conway/LMT/SC/PHILIPS) Subject: Improving the rsync protocol (RE: Rsync dies) Classification: On Fri, 17 May 2002, Allen, John L. wrote:> In my humble opinion, this problem with rsync growing a huge memory > footprint when large numbers of files are involved should be #1 on > the list of things to fix.I have certainly been interested in working on this issue. I think it might be time to implement a new algorithm, one that would let us correct a number of flaws that have shown up in the current approach. Toward this end, I've been thinking about adding a 2nd process on the sending side and hooking things up in a different manner: The current protocol has one sender process on the sending side, while the receiving side has both a generator process and a receiver process. There is only one bi-directional pipe/socket that lets data flow from the generator to the sender in one direction, and from the sender to the receiver in the other direction. The receiver also has a couple pipes connecting itself to the generator in order to get data to the sender. I'd suggest changing things so that a (new) scanning process on the sending side would have a bi-directional link with the generator process on the receiving side. This would let both processes descend through the tree incrementally and simultaneously (working on a single directory at a time) and figure out what files were different. The list of files that needed to be transferred PLUS a list of what files need to be deleted (if any) would be piped from the scanner process to the sender process, who would have a bi-directional link to the receiver process (perhaps using ssh's multi-channel support?). There would be no link between the receiver and the generator. The advantage of this is that the sender and the receiver are really very simple. There is a list of file actions that is being received on stdin by the sending process, and this indicates what files to update and which files to delete. (It might even be possible to make sender be controlled by other programs.) These programs would not need to know about exclusion lists, delete options, or any of the more esoteric options, but would get told things like the timeout settings via the stdin pipe. In this scenario, all error messages would get sent to the sender process, who would output them on stdout (flushed). The scanner/generator process would be the thing that parses the commandline, communicates the exclude list to its opposite process, and figures out exactly what to do. The scanner would spawn the sender, and field all the error messages that it generates. It would then either output the errors locally or send them over to the generator for output (depending on whether we're pushing or pulling files). As for who spawns the receiver, it would be nice if this was done by the sender (so they could work alone), but an alternative would be to have the generator spawn the receiver and then then let the receiver hook up with the sender via the existing ssh connection. This idea is still in its early stages, so feel free to tell me exactly where I've missed the boat. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html