On 02.02.14 00:48, Chandler Carruth wrote:> On Sat, Feb 1, 2014 at 8:35 AM, Stephan Tolksdorf <st at quanttec.com
> <mailto:st at quanttec.com>> wrote:
>
> Hi,
>
> Currently the hashing implementation in ADT/Hashing.h produces hash
> values on 32-bit platforms that differ from the lower 32-bits of the
> hash values produced on 64-bit platforms. It seems the only reason
> for this difference is that the uint64_t integer seed is truncated
> to size_t. Since the usage of uint64_t and size_t as types for seed
> values in the implementation is somewhat inconsistent, I'm
> wondering: Was this difference deliberately introduced as a
> performance optimization for 32-bit code, or is this maybe just an
> implementation artifact?
>
> I also noted that the implementation relies on implicit conversions
> from uint64_t to size_t in several places where hash_code values are
> returned, which may trigger compiler warnings on 32-bit platforms.
>
>
> Almost all of this was largely incidental on my part. I didn't spend a
> lot of time tuning the 32-bit platform performance. If you have ideas
> that you think would make things better, I'm certainly interested. =D
>
> One thing I would note is that I would prefer to retain the collision
> resistance where possible. So I would be leery of reducing the internal
> state used. I would more focus on making the outputs friendly for 32-bit
> platforms and the inputs consistent.
I've attached a patch that makes the hashing implementation consistently
use a 64-bit seed everywhere. With this change the implementation
should produce the same hash codes modulo SIZE_MAX + 1 for identical
values independent of the platform. (Though hash_combine may still
produce different results if the size of an argument type differs
between platforms). I suspect the negative performance impact on 32-bit
platforms should be small, but I didn't do any benchmarking. With
atomics one could probably replace the thread safe local static in
get_execution_seed with something that has a little less overhead.
The patch also removes a FIXME from the set_fixed_execution_seed
implementation and rewords the documentation string a bit, since using
this function in a multi-threaded program requires some kind of external
synchronization anyway.
Another attached patch implements llvm::sys::Process::GetRandomNumber()
on Windows. (There's already a Unix implementation.) This function could
be useful for implementing the missing random per-execution seed
functionality. (Though it might take a little care to deal with
Process::GetRandomNumberSeed potentially calling back into hash_combine
on Unix.)
In a little experiment I made, changing the seed per execution seemed to
lead to test suite errors for clang's PCH support and other components.
- Stephan
-------------- next part --------------