Hello everyone, I have a question regarding syncing of large files. We have a setup where we need to sync a 500GB file with changes in random blocks. The changes are made in a manner, so moving blocks is not probable (actually it will never happen). As I understand it, the way rsync works is: - remote server calculates all checksum blocks and transmits these to client - local client calculates all checksum blocks - local client compares local blocks to remote blocks, checking if a block could have moved - changes are synced Is it possible to have rsync calculate one block, transmit checksum, let local decide to transmit or not, and then move on to next block? The way it works now it takes a lot of time to actually transfer anything, and uses a LOAD of memory (around 500MB). We're using 64K blocks. If we do the rsync with "whole-files" option it takes 17 hours to complete the sync. Without "whole files" (checksumming enabled) it takes 5 days! (We've got lots of bandwidth at the moment btw.) Thanks! -- Best regards, Lars Karlslund Pharma Nord IT department
On Thu, Feb 24, 2005 at 02:01:59PM +0100, Lars Karlslund wrote:> As I understand it, the way rsync works is: > - remote server calculates all checksum blocks and transmits these to client > - local client calculates all checksum blocks > - local client compares local blocks to remote blocks, checking if a block > could have moved > - changes are syncedNot quite. There is not a need to pre-calculate all the checksums on the sending side, though it does put all the checksum data from the sending side into memory in order to be able to look it up randomly as it is reading through the file and sending data. It would certainly be possible to change the algorithm to not cache the data (and thus only allow the current block to be compared), but I don't think that idea has general enough interest for me to work on for inclusion in rsync. You might want to look into coding it up for yourself. However, you should be sure to have measured what is causing the slowdown first to know how much that will help. If it is not memory that is swapping on the sender, it may be that the computing of the checksums in maxing out your CPU, and removing the caching of the remote checksums won't buy you as much as you think. You could use some of the librsync tools (e.g. rdiff) to calculate how long various actions take on each system (i.e. try running rdiff on each system outputting to /dev/null to see how long the computing of the checksums takes). ..wayne..
Kevin Day wrote:> I would *strongly* recommend that you dig into the thesis a bit (just > the section that describes the rsync algorithm itself).I tried a few weeks ago. I started to print it, and my printer ran out of ink :-). I will read it electronically eventually (I hope).> Now, if you have huge files, then the 16 bit checksum may not be > sufficient to keep the hit rate down. At that point, you may want to > consider a few algorithmic alternatives: > > 1. Break up your file into logical blocks and rsync each block > individually. If the file is an "append only" file, then this is > fine. However, if the contents of the file get re-ordered across > block boundaries, then the efficiency of the rsync algorithm would be > seriously degraded. > > 2. Use a larger hash table. Instead of 16 bits, expand it to 20 bits > - it will require 16 times as much memory for the hash table, but that > may not be an issue for you - you are probably workring with some > relatively beefy hardware anyway, so what the heck.Now, here's the part where I don't get it. We have X blocks checksummed, covering Y bytes each (actually, we have X blocks of checksum covering X bytes each, but that's not important). This means we actually know, before we get the list of checksums, how many we will have. As for the hash table size - that's standard engineering. Alpha is defined as the ratio between the number of used buckets in the table to the number of total buckets. 0.8 is considered a good value. What I can propose is to make the hash table size a function of X. If we take Lars' case, he has 500GB file, which means you ideally need about 1 million buckets in the hash to have reasonable performance. We only have 65 thousand. His Alpha is 0.008. No wonder he is getting abysmal performance. On the other hand, if I'm syncing a 100K file, I'm only going to have 320 blocks. A good hash table size for me will be 400 buckets. Having 65536 buckets instead means I'm less likely to have memory cache hits, and performance suffers again. My Alpha value is 204 (instead of 0.8). If my proposal is accepted, we will be adaptive in CPU-memory trade off.> I'll leave the excercise of converting the full rsync 32 bit rolling > checksum into a 20 bit value to you.A simple modulo ought to be good enough. If the checksum is 42891 and I have 320 buckets, it should go into bucket 11. Assuming the checksums are sufficiently random-like, this algorithm is good enough.> Cheers, > > KevinShachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html
Kevin Day wrote:> Shachar- > > True enough - with one additional thought - if the block size is set > to be the square root of the file size, then the load factor on the > hash table becomes dynamic in and of itself (bigger block size = less > master table entries = fewer hash collisions).And I'm suggesting making it static, by adjusting the hash table's size according to the number of blocks. Just do "hashtablesize=(numblocks/8+1)*10;", and you should be set.> In the case of relatively low bandwidth connections, you will get MUCH > better performance improvement by messing with the block size than the > size of hash table, becaues the hash table isn't sent over the wire - > the block table IS sent over the wire, so reducing it's size can have > a big impact on performance if your file isn't changing much.True, but irrelevant. The hash table performance does not come at the cost of extra bandwidth, so I see no reason not to optimize both.> In Andrew's original thesis, he looked at several very large...> The problem you face, of course, is... Kevin, I think you are confusing a couple of things: 1. It's not me with the big files. It's Lars. I can't run tests on files I don't have. I am merely trying to figure out what stopped Lars so that rsync can be better. 2. The size of each block have nothing to do with the question of hash table size. Once you've chosen the number of blocks your file will have, in whatever way you did, there is an unrelated question of what hash table size you should use. Using a 65536 buckets hash table on a 500GB divided into 64KB blocks (as Lars is using) means you have, on average, 125 collisions per bucket. Regardless of the question of whether using this size for blocks is smart or not, rsync could handle it better. That's what I'm talking about.> Trying to increase the size of the hash table may just not be worth it > - are you certain that the performance hit you are experiencingI'm not. Lars is.> is caused by processing on the recipient side, and not data transfer > of the block table? In my testing (which is actually with my own > implementation of the algorithm, so I may have optimizations/ or lack > thereof compared to the rsync you are running), the block table > transfer is the biggest cause of elapsed time for big files that don't > change much.Don't forget that Lars manages to transfer the whole file in 17 hours. I doubt transferring some information about each block takes more than the 64K the block itself is (as is Lars' case).> It may be that the square root of file size for block size isn't > appropriate for files as big as you are working with...It certainly is, as I don't work with such large files. Lars is. While at it, he is not using square root. He is using 64KB blocks.>Keyin, I'm trying to make rsync better. Lars' problem is an opportunity to find a potential bottleneck. Trying to solve his use of possibly non-optimal values won't help rsync, though it will help him. Let's keep this part of this thread on it's subject - is the hash table size optimal? I don't see how modifying the block sizes is going to change this question. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html
Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.samba.org/archive/rsync/attachments/20050304/60c2e285/attachment.bin
Kevin Day wrote:> As a quick FYI, the block size absolutely has an impact on the > effectiveness of the checksum table - larger blocks means fewer > blocks, which means fewer hash colissions.Since you wouldn't expect that many blocks in the file, a 32bit weak checksum would only produce about 4 or 5 real collisions. Mind you, this is me doing educated guesses. I haven't worked out the actual math yet. I don't think we need to worry about this particular problem just yet. Hash table collisions, however, are much more likely, which is what I'm trying to solve here.> That said, however, I completely agree that for very large files, the > number of buckets in the current implementation is not optimal. > Perhaps having a mode for really large files would be appropriate.I don't see why such a mode would be necessary.> One caution on increasing the size of the hash: The current > implementation gives 16 bits of spread, so modding that value with the > desired bucket count would work fine.That's not what I read into it. It seems to me that the checksum function gives a 32bit result, and we are squashing that into a 16bit hash table. Can you point me to the code? Wayne?> However, if you choose to start with the 32 bit rolling hash and mod > that, you will have problems. The rolling checksum has two distinct > parts, and modding will only pull info from the low order bits,Why? This may be something I missed within the code.> which will likely get you considerably less than even the 16 bits that > the current implementation gives.If the source is 16 bit, doing any hash table size bigger than 65536 buckets would make no sense, true. Is it 16bit?> I'd recommend using a real 32 bit hashing function to mix the two > rolling checksum components,What two parts? If I understand rsync correctly, we have a rolling checksum, and a real checksum. The rolling checksum is used to single out potential matches, and the real checksum makes sure these are, indeed, real matches. We only need to put the first one into the hash, as we are never doing mass-lookups on the second. Am I missing something basic here? Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html
Kevin Day wrote:> Shachar- > > I think Wayne is mostly pointing you to the correct location here. If > you look at the code where the checksum is computer, you'll find that > the rolling checksum actually consists of two parts - one is the sum > of all of the byte values in the window, the other is the offset > weighted sum of all of the byte values in the window. The first of > these is then left shifted 16 bits and added to the other to come up > with the official "32 bit" rolling checksum. This works fine as long > as you aren't counting on a random distribution of bits among the 32 - > if you mod the value, you are giving much greater importance to the > lower XX bits, effectively dropping the distribution of the high order > bits... > > Anyway, the two 16 bit values may be random enough that my concern is > not founded, but it should be tested before assuming that the rolling > checksum is really a 32 bit value that can easily be divided up into > buckets. > > PS - none of the above has anything to do with the strong signature of > the window - just the rolling check sum. > > Cheers! > > - KSome modern algebra for you, then. We have two numbers. One is always multiplied by 2^16. We want both numbers to be able to totally affect the bucket into which the eventual checksum arrives. If we will choose a hash table size that is co-prime to the bits we want to remain significant, then we achieve that. Well, guess what? Factorization of any and each bit in the combined checksums yields only twos. In other words, any hash table size that will be odd (i.e. - two is not in it's factorization primes) will be co-prime to our shifted checksums, thus promising that they will get an equal chance of affecting what bucket our checksum actually falls into. It therefor follows that I have to amend my previously proposed hash table size choosing formula. The new formula is: (numblocks/8+1)*10+1 And you're done. Of course, this can also be written as: (numblocks/8)*10+11 Which is slightly more efficient. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html