Hello, Recently while working with rsync as the way to mirror large (several GB) archive on a regular basis, I came across several problems, and also got the ideas about their possible solutions - please could you investigate & consider implementing the features, described below, to future RSYNC releases ? - when the checksumming (consider very large archive, several GB) stage of rsync runs slow (~3 and more minutes), which is the case of either slower CPU machines or machines with older HDDs that dont have UDMA or have just UDMA33 transfer modes, one can often observe that the network connection to the master site shuts down and the mirroring fails (in subsequent mirroring attempts, when, f.e., the archive is already transferred from about 90%). The reason why I think this happens, is the fact, that the bidirectionally-open connection is just reset by either client or server, becouse rsync does not do any transfer while the checksumming runs (I might be wrong, but this is what I observed), and the tcp connection is reset becouse of stall (I dont have clue by what means, becouse I'm no TCP/IP expert, but I suspect it might be just TCP/IP). How about adding a feature to keep the checksums in a berkeley-style database somewhere on the HDD separately, and with subsequent mirroring attempts, look to it just for the checksums, so that the rsync does not need to do checksumming of whole target (already mirrored) file tree ? I think implementing this could take some time, but it would certainly improve rsync's responsivenes and ease use with slow CPUs & HDDs - make output of error & status messages from rsync uniformed, so that it could be easily parsed by scripts (it is not right now - rsync 2.5.5) - perhaps if the network connection between rsync client and server stalls for some reason, implement something like 'tcp keepalive' feature ? I know these are suggestions only; I dont have enough power nor knowledges to implement them to rsync by myself (but I feel plagued myself with the problems described), so I'm sending these solution ideas to you in the hope they will be useful and could be implemented in the future. Please let me know your opinion about this. Thanks & regards, Jan
Jan Rafaj [rafaj@cedric.vabo.cz] writes:> How about adding a feature to keep the checksums in a berkeley-style > database somewhere on the HDD separately, and with subsequent > mirroring attempts, look to it just for the checksums, so that > the rsync does not need to do checksumming of whole target > (already mirrored) file tree ?There's a chicken and egg issue with this - how do you know that the separately stored checksum accurately reflects the file which it represents? Once they are stored separately they can get out of sync. The natural way to verify the checksum would be to recompute it, but then you're sort of back to square one. I know there have been discussions about this sort of thing on the list in the past. For multiple similar distributions, the rsync+ work (recently incorporated into the mainline rsync in experimental mode - the write-batch and read-batch options) helps remove repeated computations of the checksums and deltas, but it's not a generalized system for any random transfer. I've wanted similar benefits because we use dialup to remote locations and for databases with hundreds of MB or 1-2 GB, we end up wasting a bit of phone time when both sides are just computing checksums. But I'm not sure of a good generalized solution. There may be platform specific hacks (e.g., under NT, storing the computed checksum in a separate stream in the file, so it's guaranteed to be associated with the file), but I don't know of a portable way to link meta information with filesystem files. Note that if you aren't already, be sure that you up the default blocksize for large files - that can cut down significantly on both checksum computation time as well as meta data transferred over the session, since there are fewer blocks that need two checksums (weak + MD4) apiece.> - make output of error & status messages from rsync uniformed, > so that it could be easily parsed by scripts (it is not right > now - rsync 2.5.5)I know Martin has expressed some interest to the list in having something like this in the future as an option.> - perhaps if the network connection between rsync client and server > stalls for some reason, implement something like 'tcp keepalive' > feature ?I think rsync is pretty complicated at the network level already - it seems reasonable to me that rsync ought to be able to assume that the lowest level network protocol stack will get the data to the other end and/or give an error if something goes wrong without needing a lot of babysitting. In all but the rsync server cases, rsync doesn't control the network stream itself anyway (it just has a child process using ssh, rsh or anything else), so it becomes a question for that particular utility and not something rsync can do anything about. In the rsync server case, it already sets the TCP KEEPALIVE option at the socket level when it receives a connection. If your network transport between systems is problematic, there's a limited about of stuff rsync can do about it. Oh and no, just being idle on a session shouldn't terminate it, no matter how long rsync takes to compute checksums. So if that's happening to you, you might want to investigate your network connectivity. Or perhaps you're going through a NAT or some sort of proxy box that places a timeout on TCP sessions that you can increase? Upon failures, if you use --partial and a separate destination directory you can keep re-trying and slowly get the whole file across (that's how we do our backups) but you do still need to recompute checksums each time. It might be nice to see if rsync itself could have a retry mechanism that would re-use the existing checksum information it had computed previously. I have a feeling with the structure of the code at this point though that doing so would be reasonably complicated. The caveat to --partial is that once you have a partial file, even with --compare-dest, that partial file is all rsync considers for the remaining portion of the transfer. So originally for our database backups, I was removing any partial copy manually if it was less than some fraction of the previous copy I already had, since I'd lose less time rebuilding that fraction than losing access to the entire prior file. In response to that, there was another internal-use patch I made to rsync to "--partial-pad" any partial file with data from the original file on the destination system during an error. No guarantees it would work as well, since I just took data from the original file past the size point of the partial copy, but in many cases (growing files) its a big win. If anyone is interested, I could extract it and post it. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
tim.conway@philips.com
2002-Apr-19 16:42 UTC
Future RSYNC enhancement/improvement suggestions
The problem with cached checksums is that unless the filesystem driver regenerates them as the filesystem is modified, they're meaningless on a live filesystem. I ran into a similar problem on huge trees on slow NAS, and have finally written my own system (does no checksumming, but instead acts like rsync -W, if timestamp and size match, we're done), and sends everything in chunks, a list of non-directories to unlink, a list of directories to rmdir (in depth order, of course), and a gzipped tar, 8Mb at a time. Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?" Jan Rafaj <rafaj@cedric.vabo.cz> Sent by: rsync-admin@lists.samba.org 04/19/2002 04:23 AM To: <rsync@samba.org> cc: <rafaj@cedric.vabo.cz> (bcc: Tim Conway/LMT/SC/PHILIPS) Subject: Future RSYNC enhancement/improvement suggestions Classification: Hello, Recently while working with rsync as the way to mirror large (several GB) archive on a regular basis, I came across several problems, and also got the ideas about their possible solutions - please could you investigate & consider implementing the features, described below, to future RSYNC releases ? - when the checksumming (consider very large archive, several GB) stage of rsync runs slow (~3 and more minutes), which is the case of either slower CPU machines or machines with older HDDs that dont have UDMA or have just UDMA33 transfer modes, one can often observe that the network connection to the master site shuts down and the mirroring fails (in subsequent mirroring attempts, when, f.e., the archive is already transferred from about 90%). The reason why I think this happens, is the fact, that the bidirectionally-open connection is just reset by either client or server, becouse rsync does not do any transfer while the checksumming runs (I might be wrong, but this is what I observed), and the tcp connection is reset becouse of stall (I dont have clue by what means, becouse I'm no TCP/IP expert, but I suspect it might be just TCP/IP). How about adding a feature to keep the checksums in a berkeley-style database somewhere on the HDD separately, and with subsequent mirroring attempts, look to it just for the checksums, so that the rsync does not need to do checksumming of whole target (already mirrored) file tree ? I think implementing this could take some time, but it would certainly improve rsync's responsivenes and ease use with slow CPUs & HDDs - make output of error & status messages from rsync uniformed, so that it could be easily parsed by scripts (it is not right now - rsync 2.5.5) - perhaps if the network connection between rsync client and server stalls for some reason, implement something like 'tcp keepalive' feature ? I know these are suggestions only; I dont have enough power nor knowledges to implement them to rsync by myself (but I feel plagued myself with the problems described), so I'm sending these solution ideas to you in the hope they will be useful and could be implemented in the future. Please let me know your opinion about this. Thanks & regards, Jan -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
On Fri, Apr 19, 2002 at 12:23:06PM +0200, Jan Rafaj wrote:> > Hello, > > Recently while working with rsync as the way to mirror large (several > GB) archive on a regular basis, I came across several problems, > and also got the ideas about their possible solutions > - please could you investigate & consider implementing the features, > described below, to future RSYNC releases ? > > - when the checksumming (consider very large archive, several GB) > stage of rsync runs slow (~3 and more minutes), which is the > case of either slower CPU machines or machines with older HDDs > that dont have UDMA or have just UDMA33 transfer modes, one can > often observe that the network connection to the master site > shuts down and the mirroring fails (in subsequent mirroring > attempts, when, f.e., the archive is already transferred from about > 90%). The reason why I think this happens, is the fact, that the > bidirectionally-open connection is just reset by either client > or server, becouse rsync does not do any transfer while > the checksumming runs (I might be wrong, but this is what > I observed), and the tcp connection is reset becouse of stall > (I dont have clue by what means, becouse I'm no TCP/IP expert, > but I suspect it might be just TCP/IP). > How about adding a feature to keep the checksums in a berkeley-style > database somewhere on the HDD separately, and with subsequent > mirroring attempts, look to it just for the checksums, so that > the rsync does not need to do checksumming of whole target > (already mirrored) file tree ? I think implementing this could > take some time, but it would certainly improve rsync's responsivenes > and ease use with slow CPUs & HDDsThe problem is that the generator works in the following steps: 1. for each block both checksums are calculated and stored in a table. 2. the number of entries in the table is send to the sender. 3. the content of the table is send to the sender. 4. the table is thrown away. There is no real need to do this in 4 steps. It should be possible to change this without changing the protocol. - the number of entries may be calculated from the blocksize and the size of the (flat) file. It will be send to the sender. - the rest may be done in a loop: * read a block * calculate checksums for this block and fill a sum_struct * send this sum_struct to the generator The code will become a little more complicated but it will use less memory and may be a bit faster.> - perhaps if the network connection between rsync client and server > stalls for some reason, implement something like 'tcp keepalive' > feature ?not a good idea -- the line should always be busy cu, Stefan -- Stefan Nehlsen | ParlaNet Administration | sn@parlanet.de | +49 431 988-1260
On 22 Apr 2002, Jan Rafaj <rafaj@cedric.vabo.cz> wrote:> > > On Mon, 22 Apr 2002, Stefan Nehlsen wrote: > > > On Fri, Apr 19, 2002 at 12:23:06PM +0200, Jan Rafaj wrote: > > > > > - perhaps if the network connection between rsync client and server > > > stalls for some reason, implement something like 'tcp keepalive' > > > feature ?TCP connections don't timeout anyhow. Possibly a dial-on-demand line or a firewall might drop the connection, but there should be enough traffic that this is not a problem.> PS: 4th point - how about adding feature that would enable rsync > to store the PID of the running process somewhere ? (like, > I hate to 'ps ax | grep' for the rsync on a machine where > other rsync instances might be running, controlled by other means > than my script :)For the daemon you can use the "pid file" configuration option. For clients, you should just remember the pid when you create the process, e.g. by using the $! shell special parameter. There's no straightforward means to find out the pid of the remote child, but I'm not really convinced that's very important. If you're debugging rsync it's fairly easy to do by peeking into /proc, using lsof, or some similar os-dependent mechanism. -- Martin
Martin Pool [mbp@samba.org] writes:> TCP connections don't timeout anyhow. Possibly a dial-on-demand line > or a firewall might drop the connection, but there should be enough > traffic that this is not a problem.Unless you have quite large files, in which case there can be a lengthy period (particularly if the file is being accessed across a local network) while checksums are computed where there is no traffic at all. For a while (when we had slow drives and a 10BaseT network) we could take 20-30 minutes for checksum computation on a 500-600MB database file with 4K blocks. And our long distance dialup call was completely idle during that period. At the time, I had planned on experimenting with the sort of changes that Stefan's recent response to this thread suggested - transmitting the checksum information as it was computed rather than building it up before sending anything. As it turns out, we upgraded to a faster RAID setup, and bumped the needed machines to 100BaseT, an the time went down to somewhere between 5-10 minutes typically, so the priority of making the changes dropped. But I do still think it would be a useful adjustment to the data flow within rsync at some point. I can't remember just how major the surgery looked to get the transmission to occur at the point of computation though. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
(I wrote about long files using 20-30 min to checksum without network traffic) Jason Haar [Jason.Haar@trimble.co.nz] writes:> ...But then you should have a dialup timeout of 1 hour set?Oh of course - I was more responding to Martin's comment about there being enough traffic present in general during an rsync session, since there are cases when you can have lengthy periods without traffic at all. I could also see some NAT boxes holding a particular stream for far less than an hour by default, but I don't have a particular data point for that so perhaps it's just being too conservative.> I think the problem is that you're morally upset that rsync spends so > much time sending no network traffic. Quite understandable ;-)Not sure about morally, but definitely financially :-)> What about separating the tree into subtrees and rsyncing them? That > means you go from: > > 1> dialup connection started [quick] > 2> rsync generates checksums (no network traffic) [slow] > 3> rsync transmits filesPerhaps you misunderstood - the checksum generation time that was taking so long was on a *single* file level. Rsync had already exchanged file lists and chosen the files to transfer - it was working on a single file and generating the block checksums on the receiver side to send over to the sender side. (As it turns out the transfers in question were for a single directory normally comprised of two files - a database file and its transaction log) The real rub was that after spending 20+ minutes with an idle line computing the checksum, it would then take another 30+ minutes to transmit the checksum information over. So it was (and likely still is) a case where sending the data as computed would have been a major win. At least for slow connections, the checksum computation is unlikely to be the bottleneck versus network transmission, so leaving the network idle is totally wasted time that could be fully reclaimed. I may still look into that sort of change but just haven't had the cycles yet with the decrease in our checksum time - although this particular discussion has sort of started me thinking about it again. I may review our current logs to see how much time is being wasted. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
Martin Pool [mbp@sourcefrog.net] writes:> I guess alternatively you could set the rsync timeout high, the > line-drop timeout low, and make it dial on demand. That would let the > line drop when rsync was really thinking hard, and it would come back > up as necessary. Losing the ppp channel does not by itself interrupt > any tcp sessions running across it, provided that you can recover the > same ip address next time you connect.That assumes an environment where dial-on-demand is feasible. Unfortunately, our particular setup is a direct PC to PC dial, and there's no IP involved (it's Windows<->Windows with NETBIOS/NETBEUI) so disconnecting would shut down the remote rsync. But it's an interesting thought for cases where it could get used. In general I'd expect it to be fairly fragile though unless you had complete control of the dial infrastructure or could otherwise ensure, as you note, identical IP address assignment. I don't suppose anyone knows any legacy reason why all the checksums are computed and stored in memory before transmission do they? I don't think at the time I could find any real requirement in the code that it be done that way - the sequence was pretty much generate/send/free. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
Martin Pool [mbp@sourcefrog.net] writes:> No, I think you could avoid it, and also avoid the up-front traversal > of the tree, and possibly even do this while retaining some degree of > wire compatibility. It will be a fair bit of work.Yeah, I was sort of thinking bang for the buck - munging with the file list handling reaches into far more code and would likely be far more effort to change within the current rsync source than the checksum transmission. I think the checksum would just be moving the equivalent of send_sums right into generate_sums and only touching the single generate.c module, with no noticeable difference on the wire or to other modules. I did go back and take a current look at our current transfers for the one task this for which this could make the most difference. For the ~110GB of data we synchronize each month (over V.34 dialup lines :-)), the "wasted" time with our current network/filesystem looks to be in aggregate only about 7.5 hours of phone time, which in turn is only about 1.6% of the ~480 hours used each month. So it's hard to worry extensively about that 1.6%. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/