On Fri, Feb 17, 2012 at 1:32 AM, Chris Lattner <clattner at apple.com> wrote:> On Feb 17, 2012, at 12:26 AM, Talin wrote: > > OK here's a patch with the latest, including unit tests. I've also tried > to make the comments clear that this is intended for the case of "generic" > key types, where you are either hashing multiple data types together, or > you don't know in advance what the data type will be - for cases such as > strings where a tailored algorithm is available, you should use that > instead of this. > > Makes sense. > > + /// Add a pointer value. > + /// Note: this adds pointers to the hash using sizes and endianness that > + /// depend on the host. It doesn't matter however, because hashing on > + /// pointer values is inherently unstable. > > This makes perfect sense. > > It should, as the comment was copied verbatim from FoldingSetNodeID :)> + /// Add an ArrayRef of arbitrary data. > + template<typename T> > + GeneralHash& add(ArrayRef<T> ArrayVal) { > + addBits(ArrayVal.begin(), ArrayVal.end()); > + return *this; > + } > > Doesn't this have the same host-specificity problem, except that it will > cause things that *are* stable to vary, such as arrays of char, or is the > alignment check enough? > > I thought about whether it would be possible to prevent people frompassing in ArrayRefs with unstable things, and I came to the conclusion that there's no simple way to distinguish between stable and unstable ArrayRefs. This is why I decided not to make a special "addPointer" method for pointers, because you could easily subvert it by wrapping it in a singleton ArrayRef. + /// Add a float> + GeneralHash& add(float Data) { > > It is worth adding a comment here that this does a bitwise hash, so -0.0 > and +0.0 will hash to different values even though they compare equal and > two identical NaN's will hash to the same value even though they compare > unequal. > > BTW, are there in fact any maps in LLVM that use floats as keys, otherthan uniquing of constants? And in the latter case, would you not want to distinguish between a -0.0 and +0.0 constant?> The mix logic is inherently a series of 32-bit operations. Would it be > possible and make more sense to implement them as 64-bit operations? > 64-bit hosts are the vast majority of the machines that run a compiler > these days. OTOH, making it depend on the host brings up host instability > issues. > > Actually, how much do we care about host instability here? If this is > used by hashing containers, we can just declare that iteration order is > undefined even for non-pointer values. The real problem I guess is that > we'd have to go check out all the existing DenseMap's etc to make sure > people aren't doing well-defined iteration right now and fixing the code. > What do you think? > > I think that you are thinking that existing uses of DenseMap and other ADTcontainers will be affected by this. That wasn't my plan - I was going to basically use the hashing class to create a custom DenseMapInfo for specific maps which could benefit from the optimization. Other DenseMaps would remain as they are.> > There's a couple of things I don't like: First, there's too many levels > of inlined function calls - my experience is that compilers aren't as good > at inlining nested calls as is often advertised, however I couldn't figure > out a way to get the same effect without the nested calls. > > Is there a specific observed concern here (e.g. when built with Clang) or > is this a historical problem? Compilers have gotten much better about > inlining stuff that is actually small, if Clang handles it then I think > we're good. Marking these attribute(always_inline) is massive overkill. > > Well, it is historical from about 5 years ago when I was working on EASTL.The compilers we were using at the time were gcc and MSVC. We found cases in the standard STL where inlines were nested up to 10 levels deep, and in some of those cases the compiler just gave up trying to inline things that deeply.> Overall the code is looking great. I'd recommend checking in the new API > separately from switching Constants.cpp to use it though (just so that any > problems doesn't cause both to get reverted). > > OK. I'm still working on getting a consensus from the hashing experts :)> -Chris >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120218/77faf871/attachment.html>
On Feb 18, 2012, at 1:11 AM, Talin wrote:> + /// Add an ArrayRef of arbitrary data. > + template<typename T> > + GeneralHash& add(ArrayRef<T> ArrayVal) { > + addBits(ArrayVal.begin(), ArrayVal.end()); > + return *this; > + } > > Doesn't this have the same host-specificity problem, except that it will cause things that *are* stable to vary, such as arrays of char, or is the alignment check enough? > > I thought about whether it would be possible to prevent people from passing in ArrayRefs with unstable things, and I came to the conclusion that there's no simple way to distinguish between stable and unstable ArrayRefs. This is why I decided not to make a special "addPointer" method for pointers, because you could easily subvert it by wrapping it in a singleton ArrayRef.Ok.> + /// Add a float > + GeneralHash& add(float Data) { > > It is worth adding a comment here that this does a bitwise hash, so -0.0 and +0.0 will hash to different values even though they compare equal and two identical NaN's will hash to the same value even though they compare unequal. > > BTW, are there in fact any maps in LLVM that use floats as keys, other than uniquing of constants? And in the latter case, would you not want to distinguish between a -0.0 and +0.0 constant?I don't know of any. I think that the ConstantFP uniquing code specifically has to unique things as integers to avoid problems with FP comparisons. In any case, I'm sure that the desired behavior is client specific - adding the comment just makes it obvious what the semantics are.> Actually, how much do we care about host instability here? If this is used by hashing containers, we can just declare that iteration order is undefined even for non-pointer values. The real problem I guess is that we'd have to go check out all the existing DenseMap's etc to make sure people aren't doing well-defined iteration right now and fixing the code. What do you think? > > I think that you are thinking that existing uses of DenseMap and other ADT containers will be affected by this. That wasn't my plan - I was going to basically use the hashing class to create a custom DenseMapInfo for specific maps which could benefit from the optimization. Other DenseMaps would remain as they are.Ok.> > Is there a specific observed concern here (e.g. when built with Clang) or is this a historical problem? Compilers have gotten much better about inlining stuff that is actually small, if Clang handles it then I think we're good. Marking these attribute(always_inline) is massive overkill. > > Well, it is historical from about 5 years ago when I was working on EASTL. The compilers we were using at the time were gcc and MSVC. We found cases in the standard STL where inlines were nested up to 10 levels deep, and in some of those cases the compiler just gave up trying to inline things that deeply.Ok, LLVM uses a completely different (bottom-up, incremental, simplifying as it goes) approach to inlining. I wouldn't worry about it. Please just remove the always inline markers and if it is a measured performance problem later we can add them back.> > Overall the code is looking great. I'd recommend checking in the new API separately from switching Constants.cpp to use it though (just so that any problems doesn't cause both to get reverted). > > OK. I'm still working on getting a consensus from the hashing experts :)My advise is to check in when you have to make forward progress. If people want to reshave your yak into another hairdo, then then can do that at some later time. No reason to block your progress as long as the API is good. Thanks again for working on this! -Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120218/f8ad146e/attachment.html>
OK committed. Let the shaving begin! :) On Sat, Feb 18, 2012 at 3:20 AM, Chris Lattner <clattner at apple.com> wrote:> On Feb 18, 2012, at 1:11 AM, Talin wrote: > > + /// Add an ArrayRef of arbitrary data. >> + template<typename T> >> + GeneralHash& add(ArrayRef<T> ArrayVal) { >> + addBits(ArrayVal.begin(), ArrayVal.end()); >> + return *this; >> + } >> >> Doesn't this have the same host-specificity problem, except that it will >> cause things that *are* stable to vary, such as arrays of char, or is the >> alignment check enough? >> >> I thought about whether it would be possible to prevent people from > passing in ArrayRefs with unstable things, and I came to the conclusion > that there's no simple way to distinguish between stable and unstable > ArrayRefs. This is why I decided not to make a special "addPointer" method > for pointers, because you could easily subvert it by wrapping it in a > singleton ArrayRef. > > > Ok. > > + /// Add a float >> + GeneralHash& add(float Data) { >> >> It is worth adding a comment here that this does a bitwise hash, so -0.0 >> and +0.0 will hash to different values even though they compare equal and >> two identical NaN's will hash to the same value even though they compare >> unequal. >> >> BTW, are there in fact any maps in LLVM that use floats as keys, other > than uniquing of constants? And in the latter case, would you not want to > distinguish between a -0.0 and +0.0 constant? > > > I don't know of any. I think that the ConstantFP uniquing code > specifically has to unique things as integers to avoid problems with FP > comparisons. In any case, I'm sure that the desired behavior is client > specific - adding the comment just makes it obvious what the semantics are. > > > Actually, how much do we care about host instability here? If this is > used by hashing containers, we can just declare that iteration order is > undefined even for non-pointer values. The real problem I guess is that > we'd have to go check out all the existing DenseMap's etc to make sure > people aren't doing well-defined iteration right now and fixing the code. > What do you think? > >> >> I think that you are thinking that existing uses of DenseMap and other > ADT containers will be affected by this. That wasn't my plan - I was going > to basically use the hashing class to create a custom DenseMapInfo for > specific maps which could benefit from the optimization. Other DenseMaps > would remain as they are. > > > Ok. > > >> Is there a specific observed concern here (e.g. when built with Clang) or >> is this a historical problem? Compilers have gotten much better about >> inlining stuff that is actually small, if Clang handles it then I think >> we're good. Marking these attribute(always_inline) is massive overkill. >> >> Well, it is historical from about 5 years ago when I was working on > EASTL. The compilers we were using at the time were gcc and MSVC. We found > cases in the standard STL where inlines were nested up to 10 levels deep, > and in some of those cases the compiler just gave up trying to inline > things that deeply. > > > Ok, LLVM uses a completely different (bottom-up, incremental, simplifying > as it goes) approach to inlining. I wouldn't worry about it. Please just > remove the always inline markers and if it is a measured performance > problem later we can add them back. > > > >> Overall the code is looking great. I'd recommend checking in the new API >> separately from switching Constants.cpp to use it though (just so that any >> problems doesn't cause both to get reverted). >> >> OK. I'm still working on getting a consensus from the hashing experts :) > > > My advise is to check in when you have to make forward progress. If > people want to reshave your yak into another hairdo, then then can do that > at some later time. No reason to block your progress as long as the API is > good. > > Thanks again for working on this! > > -Chris > >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120218/8362ce6d/attachment.html>
On Sat, Feb 18, 2012 at 3:20 AM, Chris Lattner <clattner at apple.com> wrote:> My advise is to check in when you have to make forward progress. If > people want to reshave your yak into another hairdo, then then can do that > at some later time. No reason to block your progress as long as the API is > good.I was trying to give feedback on the API, and specifically suggest an alternative that might be both easier to use, and naturally dovetail with an upcoming standard. =/ Is there no interest in this? I'm happy to contribute an implementation using this API (we'll need to implement it anyways), but I'm actually not interested in just shaving yaks. I'm interested in getting a really good API here, because this is the support library that is usually held to a very high bar for APIs... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120218/09dd05bd/attachment.html>