Hi, after some searching i didn't came up with an answer so please excuse if this is a total newbie question. My problem: I have server A which has a big (>500G) database like file. On server B i want to have a copy of this file which i don't want to copy each time but sync the deltas so that only the deltas are written once a day. Bandwidth between A and B isn't the problem. The sync should be as fast as possible. So i want to achive somethink like a binary diff -u and patch. Is rsync the right tool for this? Or will the rsync mechanism create too much overhead to sync this file? Writing only the deltas is key for me. Is there a better method? Thanks ... Oliver
On Thu, Feb 21, 2002 at 10:37:13PM +0100, Oliver Krause wrote:> My problem: > I have server A which has a big (>500G) database like file. On server B iDoes "database like" mean it'll be in use when the rsync job runs? What about data in memory - that's not flushed to disk? [If you're talking M$ Windows - this just won't be possible BTW - ever hear of locking? ;-)] -- Cheers Jason Haar Information Security Manager Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417
I don't think rsync could do that. I would think it could possibly be efficient about transferring files where new data kept on being appended at the end, if you used some tricky combo of command line switches with --partial and other hacks. The big problem is when diffs are usually done, you need to compare every bite in both files to find the deltas. So in a network situation you wouldn't save any effort because everything would have to go across the network anyhow, so why not just copy the file? An algorithm that divided the files into chunks, calculates the CRCs for each chunk on both sides, compared the list, and then recursively did this process for each chunk that didn't match, until it reached some minimal chunk size where it reverted to a byte by byte diff. I would think the chunk sizes would need to be chosen carefully based on how the file is modified, and this method would be less efficient for files with a large percentage of modifications evenly distributed across a file. I'm sure being able to this kind of thing would be interesting for people trying to keep server farms of read only databases going. A highly scalable search engine would be such an application. Although in that case you might want to keep a copy of the old file around to compare locally for diffs, and then send the diff files out to all the servers, since they are all in the same known state. Anyhow, its an interesting problem but I know of nothing out there that works this way. On 2/21/02 4:37 PM, "Oliver Krause" <krauseo@gmx.net> wrote:> Hi, > > after some searching i didn't came up with an answer so please excuse if this > is a total newbie question. > > My problem: > I have server A which has a big (>500G) database like file. On server B i > want to have a copy of this file which i don't want to copy each time but > sync the deltas so that only the deltas are written once a day. Bandwidth > between A and B isn't the problem. The sync should be as fast as possible. > > So i want to achive somethink like a binary diff -u and patch. > > Is rsync the right tool for this? Or will the rsync mechanism create too much > overhead to sync this file? Writing only the deltas is key for me. Is there a > better method? > > Thanks ... Oliver
It uses a "rolling" checksum, so that it can actually find byte-level changes, insertions, deletions. There's a block size for how big a chunk to start with, but then it works within those to figure out where to make the changes. It's hyper-efficient. The thing about transferring whole files (-W option) is actually an override of the default, mostly to keep it from wasting time reading everything twice over NFS, or for any case where you know up front that you're better off just sending the whole thing if you have to read it at all (time or size mismatch). Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?"
On 21 Feb 2002, Oliver Krause <krauseo@gmx.net> wrote:> Hi, > > after some searching i didn't came up with an answer so please excuse if this > is a total newbie question. > > My problem: > I have server A which has a big (>500G) database like file. On server B i > want to have a copy of this file which i don't want to copy each time but > sync the deltas so that only the deltas are written once a day. Bandwidth > between A and B isn't the problem. The sync should be as fast as > possible.rsync should work well, though as Dave noted you need enough disk space on the destination machine to hold a second temporary copy during transfer. Memory usage should be moderate. If not, report a bug. If you're feeling adventurous, you might like to investigate the rdiff tool available from rproxy.samba.org. This might give you a reduction in the amount of disk IO compared to rsync, depending on the load. -- Martin