Nick Lewycky
2012-Mar-01 05:22 UTC
[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
// -- 'hash_code' class is an opaque type representing the hash code for some // data. It is the intended product of hashing, and can be used to implement vs. // -- 'hash_value' is a function designed to be overloaded for each // user-defined type which wishes to be used within a hashing context. It The first paragraph has a hanging indent but the second and third don't. // benchmarked at over 8.5 GiB/s for large keys, and <20 cycles/ Trailing punctuation. #include <cassert> #include <cstring> #include <algorithm> #include <utility> #include <iterator> Sort. /// \brief Form a hash code directly from a numerical value. hash_code(size_t value) : value(value) {} why not assert that the value is not the null or invalid value? I realize you'd need to have a private constructor for the null and invalid ones then. // All of the implementation details of actually computing the various hash // code values are held within this namespace. These routines are included in // the header file mainly to allow inlining and constant propagation. namespace hashing { namespace detail { One namespace per line please, in LLVM. [I only skimmed the implementation details of the hashing. It all looks plausible enough.] } } // End hashing detail namespaces. Again, one per line. You do this multiple times in Hashing.h, please fix them all. /// reproduce *exactly* a specific behavior. To that end, we provide a function /// which will forcible set the seed to a fixed value. This must be done at the forcible -> forcibly template <typename T> struct is_hashable_data : integral_constant<bool, ((is_integral<T>::value || is_pointer<T>::value) && 64 % sizeof(T) == 0)> {}; Optional: I object to the leading && and propose this reflowed text: template <typename T> struct is_hashable_data : integral_constant<bool, ((is_integral<T>::value || is_pointer<T>::value) && 64 % sizeof(T) == 0)> {}; . #if __has_feature(__cxx_variadic_templates__) I'm pretty sure non-clang compilers will bark at that unless you #define'd __has_feature(X) to 0? --- a/unittests/ADT/HashingTest.cpp +++ b/unittests/ADT/HashingTest.cpp @@ -13,45 +13,295 @@ #include "gtest/gtest.h" #include "llvm/ADT/Hashing.h" +#include "llvm/Support/DataTypes.h" +#include <vector> +#include <list> +#include <deque> +#include <map> Sort. + // Helper for test code to print hash codes. + void PrintTo(const hash_code &code, ::std::ostream *os) { What's with the extra leading :: before std::? + namespace hashing { namespace detail { + template <> struct is_hashable_data<LargeTestInteger> : true_type {}; + } } // End hashing detail namespace. +} // End LLVM namespace. A reminder about 1 namespace-per-line, but also the comments on ending namespaces are odd. They usually look like "} // end namespace llvm" or "} // namespace llvm", and I prefer the latter. Nick On 29 February 2012 01:35, Chandler Carruth <chandlerc at google.com> wrote:> Thanks for the feedback thus far! > > I've updated the header file, and enclosed a complete patch against LLVM. > This passes all the regtests, and I'll be doing more thorough testing of > LLVM itself with the patch. I've included some basic unit tests, but could > probably do more here... Not sure it's worth delaying the initial > submission though, as the best testing is to use a hash testing suite... > > I've addressed the comments from Nadav, but there may be more minor > stylistic issues or typos. Please keep pointing them out! I appreciate the > help there. > > I've also corrected my stub for the execution-seed to be more correct and > to include the ability to override it during program startup. It still > doesn't go the final step to make it actually change on each execution, but > is now otherwise identical. (In particular, it is no longer a compile-time > constant that can be folded.) This regressed the performance a tiny bit, > however... > > There are several improvements to the implementation of the hashing > algorithm itself. The way the hashing included the seed hurt performance > quite a bit. I've re-worked it to be much lighter weight, and even after > the execution seed fix above slowed things down, this speeds everything up > more. We shave 4ns off the 4 to 8 byte case, bringing performance above > that of essentially every high quality hashing algorithm... > > I still think we can do more, but it's already much faster than the > existing LLVM one except for the issue Tobias pointed out w/ modulo-4 key > sizes. I'm going to investigate this, but it may be a consequence of a > weakness in that algorithm. > > I've re-run the SMHasher quality testing suite and the results are very > good. There are a few problems identified, but these are long-known > problems with CityHash in general, and have proven to thus far not be a > cause of real-world issues... I'd like to fix them, but it doesn't seem a > high priority yet. > > Finally, I've run some rough initial numbers for x86 32-bit, and it's > surprisingly good. The relative speeds of this algorithm and others don't > seem to change much. A bit suspicious of that, so going to do more thorough > analysis. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120229/cd68bf58/attachment.html>
Ahmed Charles
2012-Mar-01 06:03 UTC
[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
> + // Helper for test code to print hash codes. > + void PrintTo(const hash_code &code, ::std::ostream *os) { > > What's with the extra leading :: before std::?Have you ever tried: namespace foo { class std {}; } using namespace foo; #include <vector> Well, I'm not sure that Chandler is guarding against this possibility, but most library implementations of the standard use ::std:: everywhere to avoid this potential for ambiguity.
Chandler Carruth
2012-Mar-01 13:01 UTC
[LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
Thanks for all the comments! I think I've addressed all of them, as wel as Duncan's comments from IRC. Based on your OK Nick, I'm planning to commit this tomorrow. If anyone has objections or serious concerns, please let me know to hold off. Updated patch is attached, as well as the latest version of the header. For the record, the only concern that has come up thus far is one of performance. The explanation there is that while some algorithms are slightly faster (5-10 cycles max), they are significantly lower quality, and don't currently show up on profiles. I'd like to get the quality up to remove collisions, and then continue working to improve the actual performance. Clearly, very sensitive and hot routines like the Clang lexer's StringMap aren't about to change without careful benchmarking and numbers on those specific components. =] I think StringMap will be the very last bit of hashing to change considering its requirements. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120301/8d390d1e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Hashing.h Type: text/x-chdr Size: 28938 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120301/8d390d1e/attachment.h> -------------- next part -------------- A non-text attachment was scrubbed... Name: hashing2.diff.gz Type: application/x-gzip Size: 13939 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120301/8d390d1e/attachment.bin>
Reasonably Related Threads
- [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
- [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
- [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
- [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)
- [LLVMdev] Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++)