Hi folks, here is a short RFC regarding hashing for the implementation of switch statements and my preliminary patch. I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You can find a copy e.g. on http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html. Best regards Jasper == Preliminary: Special identifiers n number of given switch labels N size of the (intermediate) hash range, usually n rounded up to the next power of 2 x the value to be switch upon Why (perfect) hashing? Hashing enables a jump table like approach even for sparse label sets. Let's concentrate on perfect hashing in order to keep the code smaller and faster. Hashing is faster than a decision tree provided there are enough labels, i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels assuming random values of x. The memory consumption of the hashing tables is comparable to the memory footage of the code of a decision tree. A decision tree consumes a lot of branch prediction resources. For minimal perfect hashing the jump table does not need to contain dummy labels and therefore has n instead of (almost) N elements. "Switch-to-lookup tables": If all jump targets are sufficiently similar hashing opens the possibility to use table which contain the differences. Thereby the indirect jump is eliminated and we get a huge performance gain. Why not hashing? A decision tree or even an else-if chain is faster than hashing for very few labels. The branch prediction logic usually does not work as satisfactorily as for decision trees if fed with cyclical access patterns; i.e. the break-even point in this scenario is higher. Hashing can not deal with large label ranges since very single label is treated. A possible compromise/solution might be to first catch all single labels and sufficiently short ranges by hashing and then treat the remaining ranges in a decision tree. Why reversible hashing? Reversible hashing does not need a value table and a simple range check of the calculated hash value is sufficient. Reversible hashing is very simple to implement. The usual jump table approach is a special case of reversible hashing. Why simple hashing? A comparison against a value table is sufficient. Simple hashing does not need an extra table. Why not simple hashing? Simple hashing needs a lot of hash function candidates to be tested until a suitable is found; this costs time in the compiler. For large label sets the chance to find such a function is very small; the applicability is therefore reduced to quite small sets. Minimal perfect hashing is usually not achievable. Why Jenkins' hashing? Jenkins' hashing produces very fast and compact code and needs one extra table (BTable) with usually less elements than n; the elements itself are unsigned integers which can become as high N-1. The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 and f2 limit their range by shifting or masking, then the final hash as h = h1 ^ BTable[h1]. Minimal perfect hashing is achievable, albeit often at the cost of larger tables. For sufficiently high N the table memory can optionally be reduced by introducing an extra scrambling table (STable); BTable then consists of bytes whereas STable contains 256 words or DWords. Obviously this reduction of memory costs another indirection in the resulting code. As far as I tested Jenkins' hashing can be used for label sets of several thousand elements; often even one million labels can be treated. The application in a compiler is to lower switch statements. Switch statements with much more than 1000 labels are almost never present in practice. Jenkins' hashing usually is sufficiently fast. By the way: You will find the following remark in lib/Target/README.txt: ==> Investigate lowering of sparse switch statements into perfect hash tables: http://burtleburtle.net/bob/hash/perfect.html <= Why not Jenkins' hashing? The search time of Jenkins' hashing might become unacceptably long for extraordinary large label sets. In practice this should never happen, however in this case the label set will be split in two parts are the process of lowering switches will be repeated. An alternative (for very large label sets) could be a different perfect hash algorithm such as CHM. Why CHM? CHM is extraordinarily fast at generating the extra table with a runtime of O(1). CHM can be used to hash almost arbitrarily many labels. CHM produces an ordered minimal perfect hashing function. Why not CHM? CHM needs extra table space of 2.09 * n elements; this is larger compared to Jenkins' method. CHM (in the implementation of cmph-2.0) is much more complicated than the code of Jenkins' hashing: h1 = f1(x) % N; h2 = f2(x) % N; if (h1 == h2 && ++h2 >= N) h2 = 0; h = (Table[h1] + Table[h2]) % n; By enlarging N to a power of 2 and other tweaking we might replace the 3 modulo operations by shifting or masking at the cost of more table space. We might also eliminate the conditional statement if in this case we choose different hash functions and retry the calculation. There is at least one more table access compared to Jenkins' method. ===
Joerg Sonnenberger
2014-Jan-24 20:35 UTC
[LLVMdev] RFC: Using hashing for switch statements
On Fri, Jan 24, 2014 at 08:51:56PM +0100, Jasper Neumann wrote:> Why not Jenkins' hashing? > > The search time of Jenkins' hashing might become unacceptably long > for extraordinary large label sets.It is generally not linear time, making it hardly appropiate for the default pass pipeline. That's not just a question of very large label sets -- it happens even with much smaller sets in the area of 100+ elements.> Why CHM? > > CHM is extraordinarily fast at generating the extra table with a > runtime of O(1).Probalistic O(n), not O(1).> Why not CHM? > > CHM needs extra table space of 2.09 * n elements; this is larger > compared to Jenkins' method.This is misleading. It requires 2 * n elements for the construction using two independent hash functions, it is ~1.24 * n elments for the construction using three independent hash functions.> CHM (in the implementation of cmph-2.0) is much more complicated > than the code of Jenkins' hashing: > h1 = f1(x) % N; > h2 = f2(x) % N; > if (h1 == h2 && ++h2 >= N) h2 = 0; > h = (Table[h1] + Table[h2]) % n;The third line is strange -- that's normally just rejected during graph construction. As such the normal output is: table[M] = { ... } h1 = f1(x) % M; h2 = f2(x) % M; return (table[h1] + table[h2]) % N; with M and N being compile-time constants. As such, the modulo operations will be replaced by two multiplications. Note that the two table accesses are independent and parallizable. The third alternative is BPZ, which would give a hash function like: h1 = f1(x) % M; h2 = f2(x) % M; idx = (g[h1] + g[h2]) % 2 return idx == 0 ? h1 : h2; The table load is a bit load here. This version would require 2 bit/key for a perfect hash function. Bit load tends to be somewhat messier though, so this is less attractive. Joerg
Anton Korobeynikov
2014-Jan-24 20:41 UTC
[LLVMdev] RFC: Using hashing for switch statements
Hi Jasper, As it was stated before - all these are pretty fine theoretical arguments. However, without real numbers of benchmarking it's hard to say anything definite about whether this approach is preferable. There are plenty of switches in LLVM and clang itself. Can't you provide some statistics how the stuff was lowered before and after? On Fri, Jan 24, 2014 at 11:51 PM, Jasper Neumann <jn at sirrida.de> wrote:> Hi folks, > > here is a short RFC regarding hashing for the implementation of switch > statements and my preliminary patch. > I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You can > find a copy e.g. on > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html. > > Best regards > Jasper > > ==> > Preliminary: Special identifiers > > n number of given switch labels > N size of the (intermediate) hash range, usually n rounded up to the next > power of 2 > x the value to be switch upon > > > Why (perfect) hashing? > > Hashing enables a jump table like approach even for sparse label sets. > Let's concentrate on perfect hashing in order to keep the code smaller and > faster. > Hashing is faster than a decision tree provided there are enough labels, > i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels > assuming random values of x. > The memory consumption of the hashing tables is comparable to the memory > footage of the code of a decision tree. > A decision tree consumes a lot of branch prediction resources. > For minimal perfect hashing the jump table does not need to contain dummy > labels and therefore has n instead of (almost) N elements. > "Switch-to-lookup tables": If all jump targets are sufficiently similar > hashing opens the possibility to use table which contain the differences. > Thereby the indirect jump is eliminated and we get a huge performance gain. > > > Why not hashing? > > A decision tree or even an else-if chain is faster than hashing for very few > labels. > The branch prediction logic usually does not work as satisfactorily as for > decision trees if fed with cyclical access patterns; i.e. the break-even > point in this scenario is higher. > Hashing can not deal with large label ranges since very single label is > treated. A possible compromise/solution might be to first catch all single > labels and sufficiently short ranges by hashing and then treat the remaining > ranges in a decision tree. > > > Why reversible hashing? > > Reversible hashing does not need a value table and a simple range check of > the calculated hash value is sufficient. > Reversible hashing is very simple to implement. > The usual jump table approach is a special case of reversible hashing. > > > Why simple hashing? > > A comparison against a value table is sufficient. > Simple hashing does not need an extra table. > > > Why not simple hashing? > > Simple hashing needs a lot of hash function candidates to be tested until a > suitable is found; this costs time in the compiler. > For large label sets the chance to find such a function is very small; the > applicability is therefore reduced to quite small sets. > Minimal perfect hashing is usually not achievable. > > > Why Jenkins' hashing? > > Jenkins' hashing produces very fast and compact code and needs one extra > table (BTable) with usually less elements than n; the elements itself are > unsigned integers which can become as high N-1. > The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 and > f2 limit their range by shifting or masking, then the final hash as h = h1 ^ > BTable[h1]. > Minimal perfect hashing is achievable, albeit often at the cost of larger > tables. > For sufficiently high N the table memory can optionally be reduced by > introducing an extra scrambling table (STable); BTable then consists of > bytes whereas STable contains 256 words or DWords. Obviously this reduction > of memory costs another indirection in the resulting code. > As far as I tested Jenkins' hashing can be used for label sets of several > thousand elements; often even one million labels can be treated. > The application in a compiler is to lower switch statements. Switch > statements with much more than 1000 labels are almost never present in > practice. > Jenkins' hashing usually is sufficiently fast. > By the way: You will find the following remark in lib/Target/README.txt: > ==> > Investigate lowering of sparse switch statements into perfect hash tables: > http://burtleburtle.net/bob/hash/perfect.html > <=> > > Why not Jenkins' hashing? > > The search time of Jenkins' hashing might become unacceptably long for > extraordinary large label sets. > In practice this should never happen, however in this case the label set > will be split in two parts are the process of lowering switches will be > repeated. > An alternative (for very large label sets) could be a different perfect hash > algorithm such as CHM. > > > Why CHM? > > CHM is extraordinarily fast at generating the extra table with a runtime of > O(1). > CHM can be used to hash almost arbitrarily many labels. > CHM produces an ordered minimal perfect hashing function. > > > Why not CHM? > > CHM needs extra table space of 2.09 * n elements; this is larger compared to > Jenkins' method. > CHM (in the implementation of cmph-2.0) is much more complicated than the > code of Jenkins' hashing: > h1 = f1(x) % N; > h2 = f2(x) % N; > if (h1 == h2 && ++h2 >= N) h2 = 0; > h = (Table[h1] + Table[h2]) % n; > By enlarging N to a power of 2 and other tweaking we might replace the 3 > modulo operations by shifting or masking at the cost of more table space. We > might also eliminate the conditional statement if in this case we choose > different hash functions and retry the calculation. > There is at least one more table access compared to Jenkins' method. > > ==> _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
On Fri, Jan 24, 2014 at 12:41 PM, Anton Korobeynikov < anton at korobeynikov.info> wrote:> Hi Jasper, > > As it was stated before - all these are pretty fine theoretical > arguments. However, without real numbers of benchmarking it's hard to > say anything definite about whether this approach is preferable. > > There are plenty of switches in LLVM and clang itself. Can't you > provide some statistics how the stuff was lowered before and after?Sounds like it's worth implementing in tree, off by default, and then doing some benchmarking. FWIW, I don't think sparse switches are very common in LLVM, so it's not clear how often this optimization would fire.> On Fri, Jan 24, 2014 at 11:51 PM, Jasper Neumann <jn at sirrida.de> wrote: > > Hi folks, > > > > here is a short RFC regarding hashing for the implementation of switch > > statements and my preliminary patch. > > I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. > You can > > find a copy e.g. on > > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html > . > > > > Best regards > > Jasper > > > > ==> > > > Preliminary: Special identifiers > > > > n number of given switch labels > > N size of the (intermediate) hash range, usually n rounded up to the > next > > power of 2 > > x the value to be switch upon > > > > > > Why (perfect) hashing? > > > > Hashing enables a jump table like approach even for sparse label sets. > > Let's concentrate on perfect hashing in order to keep the code smaller > and > > faster. > > Hashing is faster than a decision tree provided there are enough labels, > > i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 labels > > assuming random values of x. > > The memory consumption of the hashing tables is comparable to the memory > > footage of the code of a decision tree. > > A decision tree consumes a lot of branch prediction resources. > > For minimal perfect hashing the jump table does not need to contain dummy > > labels and therefore has n instead of (almost) N elements. > > "Switch-to-lookup tables": If all jump targets are sufficiently similar > > hashing opens the possibility to use table which contain the differences. > > Thereby the indirect jump is eliminated and we get a huge performance > gain. > > > > > > Why not hashing? > > > > A decision tree or even an else-if chain is faster than hashing for very > few > > labels. > > The branch prediction logic usually does not work as satisfactorily as > for > > decision trees if fed with cyclical access patterns; i.e. the break-even > > point in this scenario is higher. > > Hashing can not deal with large label ranges since very single label is > > treated. A possible compromise/solution might be to first catch all > single > > labels and sufficiently short ranges by hashing and then treat the > remaining > > ranges in a decision tree. > > > > > > Why reversible hashing? > > > > Reversible hashing does not need a value table and a simple range check > of > > the calculated hash value is sufficient. > > Reversible hashing is very simple to implement. > > The usual jump table approach is a special case of reversible hashing. > > > > > > Why simple hashing? > > > > A comparison against a value table is sufficient. > > Simple hashing does not need an extra table. > > > > > > Why not simple hashing? > > > > Simple hashing needs a lot of hash function candidates to be tested > until a > > suitable is found; this costs time in the compiler. > > For large label sets the chance to find such a function is very small; > the > > applicability is therefore reduced to quite small sets. > > Minimal perfect hashing is usually not achievable. > > > > > > Why Jenkins' hashing? > > > > Jenkins' hashing produces very fast and compact code and needs one extra > > table (BTable) with usually less elements than n; the elements itself are > > unsigned integers which can become as high N-1. > > The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 > and > > f2 limit their range by shifting or masking, then the final hash as h > h1 ^ > > BTable[h1]. > > Minimal perfect hashing is achievable, albeit often at the cost of larger > > tables. > > For sufficiently high N the table memory can optionally be reduced by > > introducing an extra scrambling table (STable); BTable then consists of > > bytes whereas STable contains 256 words or DWords. Obviously this > reduction > > of memory costs another indirection in the resulting code. > > As far as I tested Jenkins' hashing can be used for label sets of several > > thousand elements; often even one million labels can be treated. > > The application in a compiler is to lower switch statements. Switch > > statements with much more than 1000 labels are almost never present in > > practice. > > Jenkins' hashing usually is sufficiently fast. > > By the way: You will find the following remark in lib/Target/README.txt: > > ==> > > Investigate lowering of sparse switch statements into perfect hash > tables: > > http://burtleburtle.net/bob/hash/perfect.html > > <=> > > > > > Why not Jenkins' hashing? > > > > The search time of Jenkins' hashing might become unacceptably long for > > extraordinary large label sets. > > In practice this should never happen, however in this case the label set > > will be split in two parts are the process of lowering switches will be > > repeated. > > An alternative (for very large label sets) could be a different perfect > hash > > algorithm such as CHM. > > > > > > Why CHM? > > > > CHM is extraordinarily fast at generating the extra table with a runtime > of > > O(1). > > CHM can be used to hash almost arbitrarily many labels. > > CHM produces an ordered minimal perfect hashing function. > > > > > > Why not CHM? > > > > CHM needs extra table space of 2.09 * n elements; this is larger > compared to > > Jenkins' method. > > CHM (in the implementation of cmph-2.0) is much more complicated than the > > code of Jenkins' hashing: > > h1 = f1(x) % N; > > h2 = f2(x) % N; > > if (h1 == h2 && ++h2 >= N) h2 = 0; > > h = (Table[h1] + Table[h2]) % n; > > By enlarging N to a power of 2 and other tweaking we might replace the 3 > > modulo operations by shifting or masking at the cost of more table > space. We > > might also eliminate the conditional statement if in this case we choose > > different hash functions and retry the calculation. > > There is at least one more table access compared to Jenkins' method. > > > > ==> > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > With best regards, Anton Korobeynikov > Faculty of Mathematics and Mechanics, Saint Petersburg State University > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140124/3a8435d8/attachment.html>