Francois Fayard via llvm-dev
2017-Jan-10 15:35 UTC
[llvm-dev] Default hashing function for integers (DenseMapInfo.h)
> It is not clear what the test-case is (what source, what compiler options). My suspicion is that your differences are in the noise, and most of the time is spent doing other things than hashing. Did you profile the a run, and check how much of the total time is spent in the hash-function [you may need to tweak the code a bit to not inline the actual hash function]. Also publishing the RANGE for each set of tests, since the variation between "best and worst" may be more important than the overall best time.I agree that I did not present a clear test case, which was just a random .cpp file that benchmarks different hash tables from LLVM/Google/STL compiled with -O3 option. I’ll try to make it better tomorrow, but I can assure you that it was not the noise. I have disabled turboboost and hyper-threading on my machine and ran the benchmark 10 times for every test and the difference was way larger than the observed standard deviation. But anyway, for the time being, the winner for hashing pointers is the current one from of LLVM.> Counting the number of collisions or some such would probably also help.Yes, I could do that. If I want to report all the collision, I might do that in the destructor of DenseMap. Sorry for my ignorance, but is clang/llvm multithreaded? Is it safe to write to a file in a destructor for logging purposes? François Fayard
Bruce Hoult via llvm-dev
2017-Jan-10 15:49 UTC
[llvm-dev] Default hashing function for integers (DenseMapInfo.h)
llvm is not multithreaded. On Tue, Jan 10, 2017 at 6:35 PM, Francois Fayard via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > It is not clear what the test-case is (what source, what compiler > options). My suspicion is that your differences are in the noise, and most > of the time is spent doing other things than hashing. Did you profile the a > run, and check how much of the total time is spent in the hash-function > [you may need to tweak the code a bit to not inline the actual hash > function]. Also publishing the RANGE for each set of tests, since the > variation between "best and worst" may be more important than the overall > best time. > > I agree that I did not present a clear test case, which was just a random > .cpp file that benchmarks different hash tables from LLVM/Google/STL > compiled with -O3 option. I’ll try to make it better tomorrow, but I can > assure you that it was not the noise. I have disabled turboboost and > hyper-threading on my machine and ran the benchmark 10 times for every test > and the difference was way larger than the observed standard deviation. But > anyway, for the time being, the winner for hashing pointers is the current > one from of LLVM. > > > Counting the number of collisions or some such would probably also help. > > Yes, I could do that. If I want to report all the collision, I might do > that in the destructor of DenseMap. Sorry for my ignorance, but is > clang/llvm multithreaded? Is it safe to write to a file in a destructor for > logging purposes? > > François Fayard > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170110/84c6b5ed/attachment.html>
mats petersson via llvm-dev
2017-Jan-10 15:51 UTC
[llvm-dev] Default hashing function for integers (DenseMapInfo.h)
On 10 January 2017 at 15:35, Francois Fayard via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > It is not clear what the test-case is (what source, what compiler > options). My suspicion is that your differences are in the noise, and most > of the time is spent doing other things than hashing. Did you profile the a > run, and check how much of the total time is spent in the hash-function > [you may need to tweak the code a bit to not inline the actual hash > function]. Also publishing the RANGE for each set of tests, since the > variation between "best and worst" may be more important than the overall > best time. > > I agree that I did not present a clear test case, which was just a random > .cpp file that benchmarks different hash tables from LLVM/Google/STL > compiled with -O3 option. I’ll try to make it better tomorrow, but I can > assure you that it was not the noise. I have disabled turboboost and > hyper-threading on my machine and ran the benchmark 10 times for every test > and the difference was way larger than the observed standard deviation. But > anyway, for the time being, the winner for hashing pointers is the current > one from of LLVM. >Yes, it seems to be "good". But it's always good to understand WHY.> > > Counting the number of collisions or some such would probably also help. > > Yes, I could do that. If I want to report all the collision, I might do > that in the destructor of DenseMap. Sorry for my ignorance, but is > clang/llvm multithreaded? Is it safe to write to a file in a destructor for > logging purposes? >I don't think you can rely on that ALWAYS, but clang and LLVM doesn't spawn threads - someone may use two threads simultaneously to call into llvm and clang functionality as a library tho'. In a limited addition to the code to report something like `std::cout << "Number colliisions: " << NumCollisions << std::endl;` should be safe enough when using clang as the test [and opening a file in append mode should also be reasonably safe]. If you wanted to submit such a change as a patch, it would probably require a fair bit more work to guarantee threadsafety... [I have certainly done similar quick "print something" additions to Clang and LLVM, including in destructors].> > François Fayard > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170110/2bc8bf06/attachment.html>
Francois Fayard via llvm-dev
2017-Jan-10 17:40 UTC
[llvm-dev] Default hashing function for integers (DenseMapInfo.h)
> On Jan 10, 2017, at 4:51 PM, mats petersson <mats at planetcatfish.com> wrote: > > Yes, it seems to be "good". But it's always good to understand WHY.I’ll be glad to understand how is has been crafted, especially where does to 9 come from. I would be glad to know if it is an experimental number or something that comes from thoughts around malloc? I have also worked on the hashing for integers and my first test was trying to measure difference in performance when putting the worst hash function you can think of for integers : hash(k) = 0. It turns out that the difference I measured is below 0.5 %, which is below the noise of repeating experience. So, with my benchmark (compiling a .cpp file with clang with -O3), the quality of the hash for integers doe not matter at all. It seems that the usage of integers as keys is very low. François Fayard