On Mon, Feb 08, 2010 at 10:31:23AM -0800, Chris Lattner wrote:> On Feb 7, 2010, at 1:03 PM, Gregory Petrosyan wrote: > > >On Sat, Feb 06, 2010 at 04:51:15PM -0800, Chandler Carruth wrote: > >>While I've not reviewed the patch in too much detail, it looks > >>promising. Can you run some end-to-end benchmarks to make sure that > >>cache pressure in the full program or other variables not accounted > >>for in a micro-benchmark don't dominate performance? Specifically the > >>nightly tester includes a number of real programs and machinery to > >>measure total compile time. > > > >Ok, now with some kinda-hard numbers! > > Hi Gregory, > > These numbers are so noisy, that they aren't particularly useful. > Could you try instrumenting foldingset to keep track track of the # > collisions and # hash table resizes and compare those? They should > be much more stable and still correlate directly to performance.OK, now with real numbers :-) First, the main thing: SuperFastHash appears to be the hash with best distribution. Use of MurmurHash instead generates 1.28% more collisions while doing nightly test in MultiSource/Applications. Second: I've also tested lookup3 hash, and its use generated 0.1% more collisions, compared to SFH. These results were a bit surprising for me! Number of hash table resizes is independent of used hashing algorithm, because hash table grows when 'nentries > nbuckets * 2'. Gregory
On Feb 10, 2010, at 4:49 PM, Gregory Petrosyan wrote:>> >> These numbers are so noisy, that they aren't particularly useful. >> Could you try instrumenting foldingset to keep track track of the # >> collisions and # hash table resizes and compare those? They should >> be much more stable and still correlate directly to performance. > > OK, now with real numbers :-) > > First, the main thing: SuperFastHash appears to be the hash with best > distribution. Use of MurmurHash instead generates 1.28% more collisions while > doing nightly test in MultiSource/Applications. > > Second: I've also tested lookup3 hash, and its use generated 0.1% more > collisions, compared to SFH. > > These results were a bit surprising for me! > > Number of hash table resizes is independent of used hashing algorithm, because > hash table grows when 'nentries > nbuckets * 2'.Thanks for doing the evaluation! It sounds like we should stay with SFH. In addition to fewer collisions, it is cheaper to compute. -Chris
On Wed, Feb 10, 2010 at 04:54:36PM -0800, Chris Lattner wrote:> Thanks for doing the evaluation! It sounds like we should stay with SFH. > In addition to fewer collisions, it is cheaper to compute.While I agree that staying with SFH makes sense, I should note that in microbenchmarks, MMH wins by a big margin (see the original patch). Gregory
On Thu, Feb 11, 2010 at 03:49:57AM +0300, Gregory Petrosyan wrote:> First, the main thing: SuperFastHash appears to be the hash with best > distribution. Use of MurmurHash instead generates 1.28% more collisions while > doing nightly test in MultiSource/Applications. > > Second: I've also tested lookup3 hash, and its use generated 0.1% more > collisions, compared to SFH. > > These results were a bit surprising for me!And now something really surprising, Bernstein hash unsigned bernstein_hash(const unsigned char *str, size_t len) { unsigned hash = 5381; unsigned c; while (len--) { c = *str++; hash = (hash * 33) ^ c; } return hash; } gives 1.48% less (sic!) collisions than SFH! That means less than lookup3, MMH and FNV, too. I have no idea why this simplest hash is so good for LLVM. Maybe because LLVM hashes lots of pointers? But BH is ~2 times slower than SFH. Overall, it looks like 1) Number of collisions for different hashes is within 5% interval. 2) MMH is the fastest by a big margin, FNV / BH are the slowest by a big margin. 3) Any hash will do for FoldingSet. Gregory
Apparently Analagous Threads
- [LLVMdev] FoldingSet #collisions comparison
- [LLVMdev] [PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash
- [LLVMdev] [PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash
- [LLVMdev] [PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash
- [LLVMdev] [PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash