thr3ads.net - llvm dev - [LLVMdev] RFC: Using hashing for switch statements [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Jasper Neumann

2014-Jan-24 19:51 UTC

[LLVMdev] RFC: Using hashing for switch statements

Hi folks,

here is a short RFC regarding hashing for the implementation of switch 
statements and my preliminary patch.
I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You 
can find a copy e.g. on 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html.

Best regards
Jasper

==
Preliminary: Special identifiers

n    number of given switch labels
N    size of the (intermediate) hash range, usually n rounded up to the 
next power of 2
x    the value to be switch upon


Why (perfect) hashing?

Hashing enables a jump table like approach even for sparse label sets.
Let's concentrate on perfect hashing in order to keep the code smaller 
and faster.
Hashing is faster than a decision tree provided there are enough labels, 
i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 
labels assuming random values of x.
The memory consumption of the hashing tables is comparable to the memory 
footage of the code of a decision tree.
A decision tree consumes a lot of branch prediction resources.
For minimal perfect hashing the jump table does not need to contain 
dummy labels and therefore has n instead of (almost) N elements.
"Switch-to-lookup tables": If all jump targets are sufficiently
similar
hashing opens the possibility to use table which contain the 
differences. Thereby the indirect jump is eliminated and we get a huge 
performance gain.


Why not hashing?

A decision tree or even an else-if chain is faster than hashing for very 
few labels.
The branch prediction logic usually does not work as satisfactorily as 
for decision trees if fed with cyclical access patterns; i.e. the 
break-even point in this scenario is higher.
Hashing can not deal with large label ranges since very single label is 
treated. A possible compromise/solution might be to first catch all 
single labels and sufficiently short ranges by hashing and then treat 
the remaining ranges in a decision tree.


Why reversible hashing?

Reversible hashing does not need a value table and a simple range check 
of the calculated hash value is sufficient.
Reversible hashing is very simple to implement.
The usual jump table approach is a special case of reversible hashing.


Why simple hashing?

A comparison against a value table is sufficient.
Simple hashing does not need an extra table.


Why not simple hashing?

Simple hashing needs a lot of hash function candidates to be tested 
until a suitable is found; this costs time in the compiler.
For large label sets the chance to find such a function is very small; 
the applicability is therefore reduced to quite small sets.
Minimal perfect hashing is usually not achievable.


Why Jenkins' hashing?

Jenkins' hashing produces very fast and compact code and needs one extra 
table (BTable) with usually less elements than n; the elements itself 
are unsigned integers which can become as high N-1.
The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 
and f2 limit their range by shifting or masking, then the final hash as 
h = h1 ^ BTable[h1].
Minimal perfect hashing is achievable, albeit often at the cost of 
larger tables.
For sufficiently high N the table memory can optionally be reduced by 
introducing an extra scrambling table (STable); BTable then consists of 
bytes whereas STable contains 256 words or DWords. Obviously this 
reduction of memory costs another indirection in the resulting code.
As far as I tested Jenkins' hashing can be used for label sets of 
several thousand elements; often even one million labels can be treated.
The application in a compiler is to lower switch statements. Switch 
statements with much more than 1000 labels are almost never present in 
practice.
Jenkins' hashing usually is sufficiently fast.
By the way: You will find the following remark in lib/Target/README.txt:
==>
Investigate lowering of sparse switch statements into perfect hash tables:
http://burtleburtle.net/bob/hash/perfect.html
<=

Why not Jenkins' hashing?

The search time of Jenkins' hashing might become unacceptably long for 
extraordinary large label sets.
In practice this should never happen, however in this case the label set 
will be split in two parts are the process of lowering switches will be 
repeated.
An alternative (for very large label sets) could be a different perfect 
hash algorithm such as CHM.


Why CHM?

CHM is extraordinarily fast at generating the extra table with a runtime 
of O(1).
CHM can be used to hash almost arbitrarily many labels.
CHM produces an ordered minimal perfect hashing function.


Why not CHM?

CHM needs extra table space of 2.09 * n elements; this is larger 
compared to Jenkins' method.
CHM (in the implementation of cmph-2.0) is much more complicated than 
the code of Jenkins' hashing:
h1 = f1(x) % N;
h2 = f2(x) % N;
if (h1 == h2 && ++h2 >= N)  h2 = 0;
h = (Table[h1] + Table[h2]) % n;
By enlarging N to a power of 2 and other tweaking we might replace the 3 
modulo operations by shifting or masking at the cost of more table 
space. We might also eliminate the conditional statement if in this case 
we choose different hash functions and retry the calculation.
There is at least one more table access compared to Jenkins' method.

===

Joerg Sonnenberger

2014-Jan-24 20:35 UTC

head link

[LLVMdev] RFC: Using hashing for switch statements

On Fri, Jan 24, 2014 at 08:51:56PM +0100, Jasper Neumann
wrote:> Why not Jenkins' hashing?
> 
> The search time of Jenkins' hashing might become unacceptably long
> for extraordinary large label sets.
It is generally not linear time, making it hardly appropiate for the
default pass pipeline. That's not just a question of very large label
sets -- it happens even with much smaller sets in the area of 100+
elements.
> Why CHM?
> 
> CHM is extraordinarily fast at generating the extra table with a
> runtime of O(1).
Probalistic O(n), not O(1).
> Why not CHM?
> 
> CHM needs extra table space of 2.09 * n elements; this is larger
> compared to Jenkins' method.
This is misleading. It requires 2 * n elements for the construction
using two independent hash functions, it is ~1.24 * n elments for the
construction using three independent hash functions.
> CHM (in the implementation of cmph-2.0) is much more complicated
> than the code of Jenkins' hashing:
> h1 = f1(x) % N;
> h2 = f2(x) % N;
> if (h1 == h2 && ++h2 >= N)  h2 = 0;
> h = (Table[h1] + Table[h2]) % n;
The third line is strange -- that's normally just rejected during graph
construction. As such the normal output is:

table[M] = { ... }
h1 = f1(x) % M;
h2 = f2(x) % M;

return (table[h1] + table[h2]) % N;

with M and N being compile-time constants. As such, the modulo
operations will be replaced by two multiplications. Note that the two
table accesses are independent and parallizable.

The third alternative is BPZ, which would give a hash function like:
h1 = f1(x) % M;
h2 = f2(x) % M;
idx = (g[h1] + g[h2]) % 2
return idx == 0 ? h1 : h2;

The table load is a bit load here. This version would require 2 bit/key
for a perfect hash function. Bit load tends to be somewhat messier
though, so this is less attractive.

Joerg

Anton Korobeynikov

2014-Jan-24 20:41 UTC

head link

[LLVMdev] RFC: Using hashing for switch statements

Hi Jasper,

As it was stated before - all these are pretty fine theoretical
arguments. However, without real numbers of benchmarking it's hard to
say anything definite about whether this approach is preferable.

There are plenty of switches in LLVM and clang itself. Can't you
provide some statistics how the stuff was lowered before and after?

On Fri, Jan 24, 2014 at 11:51 PM, Jasper Neumann <jn at sirrida.de>
wrote:> Hi folks,
>
> here is a short RFC regarding hashing for the implementation of switch
> statements and my preliminary patch.
> I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You
can
> find a copy e.g. on
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html.
>
> Best regards
> Jasper
>
> ==>
> Preliminary: Special identifiers
>
> n    number of given switch labels
> N    size of the (intermediate) hash range, usually n rounded up to the
next
> power of 2
> x    the value to be switch upon
>
>
> Why (perfect) hashing?
>
> Hashing enables a jump table like approach even for sparse label sets.
> Let's concentrate on perfect hashing in order to keep the code smaller
and
> faster.
> Hashing is faster than a decision tree provided there are enough labels,
> i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels
> assuming random values of x.
> The memory consumption of the hashing tables is comparable to the memory
> footage of the code of a decision tree.
> A decision tree consumes a lot of branch prediction resources.
> For minimal perfect hashing the jump table does not need to contain dummy
> labels and therefore has n instead of (almost) N elements.
> "Switch-to-lookup tables": If all jump targets are sufficiently
similar
> hashing opens the possibility to use table which contain the differences.
> Thereby the indirect jump is eliminated and we get a huge performance gain.
>
>
> Why not hashing?
>
> A decision tree or even an else-if chain is faster than hashing for very
few
> labels.
> The branch prediction logic usually does not work as satisfactorily as for
> decision trees if fed with cyclical access patterns; i.e. the break-even
> point in this scenario is higher.
> Hashing can not deal with large label ranges since very single label is
> treated. A possible compromise/solution might be to first catch all single
> labels and sufficiently short ranges by hashing and then treat the
remaining
> ranges in a decision tree.
>
>
> Why reversible hashing?
>
> Reversible hashing does not need a value table and a simple range check of
> the calculated hash value is sufficient.
> Reversible hashing is very simple to implement.
> The usual jump table approach is a special case of reversible hashing.
>
>
> Why simple hashing?
>
> A comparison against a value table is sufficient.
> Simple hashing does not need an extra table.
>
>
> Why not simple hashing?
>
> Simple hashing needs a lot of hash function candidates to be tested until a
> suitable is found; this costs time in the compiler.
> For large label sets the chance to find such a function is very small; the
> applicability is therefore reduced to quite small sets.
> Minimal perfect hashing is usually not achievable.
>
>
> Why Jenkins' hashing?
>
> Jenkins' hashing produces very fast and compact code and needs one
extra
> table (BTable) with usually less elements than n; the elements itself are
> unsigned integers which can become as high N-1.
> The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 and
> f2 limit their range by shifting or masking, then the final hash as h = h1
^
> BTable[h1].
> Minimal perfect hashing is achievable, albeit often at the cost of larger
> tables.
> For sufficiently high N the table memory can optionally be reduced by
> introducing an extra scrambling table (STable); BTable then consists of
> bytes whereas STable contains 256 words or DWords. Obviously this reduction
> of memory costs another indirection in the resulting code.
> As far as I tested Jenkins' hashing can be used for label sets of
several
> thousand elements; often even one million labels can be treated.
> The application in a compiler is to lower switch statements. Switch
> statements with much more than 1000 labels are almost never present in
> practice.
> Jenkins' hashing usually is sufficiently fast.
> By the way: You will find the following remark in lib/Target/README.txt:
> ==>
> Investigate lowering of sparse switch statements into perfect hash tables:
> http://burtleburtle.net/bob/hash/perfect.html
> <=>
>
> Why not Jenkins' hashing?
>
> The search time of Jenkins' hashing might become unacceptably long for
> extraordinary large label sets.
> In practice this should never happen, however in this case the label set
> will be split in two parts are the process of lowering switches will be
> repeated.
> An alternative (for very large label sets) could be a different perfect
hash
> algorithm such as CHM.
>
>
> Why CHM?
>
> CHM is extraordinarily fast at generating the extra table with a runtime of
> O(1).
> CHM can be used to hash almost arbitrarily many labels.
> CHM produces an ordered minimal perfect hashing function.
>
>
> Why not CHM?
>
> CHM needs extra table space of 2.09 * n elements; this is larger compared
to
> Jenkins' method.
> CHM (in the implementation of cmph-2.0) is much more complicated than the
> code of Jenkins' hashing:
> h1 = f1(x) % N;
> h2 = f2(x) % N;
> if (h1 == h2 && ++h2 >= N)  h2 = 0;
> h = (Table[h1] + Table[h2]) % n;
> By enlarging N to a power of 2 and other tweaking we might replace the 3
> modulo operations by shifting or masking at the cost of more table space.
We
> might also eliminate the conditional statement if in this case we choose
> different hash functions and retry the calculation.
> There is at least one more table access compared to Jenkins' method.
>
> ==> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Reid Kleckner

2014-Jan-24 21:53 UTC

head link

[LLVMdev] RFC: Using hashing for switch statements

On Fri, Jan 24, 2014 at 12:41 PM, Anton Korobeynikov <
anton at korobeynikov.info> wrote:
> Hi Jasper,
>
> As it was stated before - all these are pretty fine theoretical
> arguments. However, without real numbers of benchmarking it's hard to
> say anything definite about whether this approach is preferable.
>
> There are plenty of switches in LLVM and clang itself. Can't you
> provide some statistics how the stuff was lowered before and after?

Sounds like it's worth implementing in tree, off by default, and then doing
some benchmarking.

FWIW, I don't think sparse switches are very common in LLVM, so it's not
clear how often this optimization would fire.

> On Fri, Jan 24, 2014 at 11:51 PM, Jasper Neumann <jn at sirrida.de>
wrote:
> > Hi folks,
> >
> > here is a short RFC regarding hashing for the implementation of switch
> > statements and my preliminary patch.
> > I posted this patch on 2014-01-16 21:58 at llvm-commits at
cs.uiuc.edu.
> You can
> > find a copy e.g. on
> >
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html
> .
> >
> > Best regards
> > Jasper
> >
> > ==> >
> > Preliminary: Special identifiers
> >
> > n    number of given switch labels
> > N    size of the (intermediate) hash range, usually n rounded up to
the
> next
> > power of 2
> > x    the value to be switch upon
> >
> >
> > Why (perfect) hashing?
> >
> > Hashing enables a jump table like approach even for sparse label sets.
> > Let's concentrate on perfect hashing in order to keep the code
smaller
> and
> > faster.
> > Hashing is faster than a decision tree provided there are enough
labels,
> > i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8
labels
> > assuming random values of x.
> > The memory consumption of the hashing tables is comparable to the
memory
> > footage of the code of a decision tree.
> > A decision tree consumes a lot of branch prediction resources.
> > For minimal perfect hashing the jump table does not need to contain
dummy
> > labels and therefore has n instead of (almost) N elements.
> > "Switch-to-lookup tables": If all jump targets are
sufficiently similar
> > hashing opens the possibility to use table which contain the
differences.
> > Thereby the indirect jump is eliminated and we get a huge performance
> gain.
> >
> >
> > Why not hashing?
> >
> > A decision tree or even an else-if chain is faster than hashing for
very
> few
> > labels.
> > The branch prediction logic usually does not work as satisfactorily as
> for
> > decision trees if fed with cyclical access patterns; i.e. the
break-even
> > point in this scenario is higher.
> > Hashing can not deal with large label ranges since very single label
is
> > treated. A possible compromise/solution might be to first catch all
> single
> > labels and sufficiently short ranges by hashing and then treat the
> remaining
> > ranges in a decision tree.
> >
> >
> > Why reversible hashing?
> >
> > Reversible hashing does not need a value table and a simple range
check
> of
> > the calculated hash value is sufficient.
> > Reversible hashing is very simple to implement.
> > The usual jump table approach is a special case of reversible hashing.
> >
> >
> > Why simple hashing?
> >
> > A comparison against a value table is sufficient.
> > Simple hashing does not need an extra table.
> >
> >
> > Why not simple hashing?
> >
> > Simple hashing needs a lot of hash function candidates to be tested
> until a
> > suitable is found; this costs time in the compiler.
> > For large label sets the chance to find such a function is very small;
> the
> > applicability is therefore reduced to quite small sets.
> > Minimal perfect hashing is usually not achievable.
> >
> >
> > Why Jenkins' hashing?
> >
> > Jenkins' hashing produces very fast and compact code and needs one
extra
> > table (BTable) with usually less elements than n; the elements itself
are
> > unsigned integers which can become as high N-1.
> > The code first generates two hash values h1=f1(x) and h2=f2(x) where
f1
> and
> > f2 limit their range by shifting or masking, then the final hash as h
> h1 ^
> > BTable[h1].
> > Minimal perfect hashing is achievable, albeit often at the cost of
larger
> > tables.
> > For sufficiently high N the table memory can optionally be reduced by
> > introducing an extra scrambling table (STable); BTable then consists
of
> > bytes whereas STable contains 256 words or DWords. Obviously this
> reduction
> > of memory costs another indirection in the resulting code.
> > As far as I tested Jenkins' hashing can be used for label sets of
several
> > thousand elements; often even one million labels can be treated.
> > The application in a compiler is to lower switch statements. Switch
> > statements with much more than 1000 labels are almost never present in
> > practice.
> > Jenkins' hashing usually is sufficiently fast.
> > By the way: You will find the following remark in
lib/Target/README.txt:
> > ==>
> > Investigate lowering of sparse switch statements into perfect hash
> tables:
> > http://burtleburtle.net/bob/hash/perfect.html
> > <=> >
> >
> > Why not Jenkins' hashing?
> >
> > The search time of Jenkins' hashing might become unacceptably long
for
> > extraordinary large label sets.
> > In practice this should never happen, however in this case the label
set
> > will be split in two parts are the process of lowering switches will
be
> > repeated.
> > An alternative (for very large label sets) could be a different
perfect
> hash
> > algorithm such as CHM.
> >
> >
> > Why CHM?
> >
> > CHM is extraordinarily fast at generating the extra table with a
runtime
> of
> > O(1).
> > CHM can be used to hash almost arbitrarily many labels.
> > CHM produces an ordered minimal perfect hashing function.
> >
> >
> > Why not CHM?
> >
> > CHM needs extra table space of 2.09 * n elements; this is larger
> compared to
> > Jenkins' method.
> > CHM (in the implementation of cmph-2.0) is much more complicated than
the
> > code of Jenkins' hashing:
> > h1 = f1(x) % N;
> > h2 = f2(x) % N;
> > if (h1 == h2 && ++h2 >= N)  h2 = 0;
> > h = (Table[h1] + Table[h2]) % n;
> > By enlarging N to a power of 2 and other tweaking we might replace the
3
> > modulo operations by shifting or masking at the cost of more table
> space. We
> > might also eliminate the conditional statement if in this case we
choose
> > different hash functions and retry the calculation.
> > There is at least one more table access compared to Jenkins'
method.
> >
> > ==> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140124/3a8435d8/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jan 2014 - [LLVMdev] RFC: Using hashing for switch statements

[LLVMdev] RFC: Using hashing for switch statements

[LLVMdev] RFC: Using hashing for switch statements

[LLVMdev] RFC: Using hashing for switch statements

[LLVMdev] RFC: Using hashing for switch statements

Reasonably Related Threads