After messing around with this a bit recently, I've come to the following conclusions: 1. One issue is at what granularity to consider the PGO weightings. For example, should a set of 50 consecutive cases with 65% of the total weight distributed approximately evenly and which can be lowered to a table lookup be handled before the 3 remaining cases that 5%, 10%, and 20% probability, respectively? 2. Rather than saying "hot case" to mean the case with the most weight, we might want to talk about a different "unit" in which to aggregate the weights (such as in 2., considering the whole lookup table as one). 3. The primary advantages of a tree approach I suggested are likely to surround being able to have a more "global" picture of the set of cases at each step in the lowering and do all the computation up front. In a larger scope, I'm convinced that in order to make appropriate tradeoffs, we probably need to analyze lots of real-world code. For example: - some use cases of `switch` are extremely performance-critical, like lexers/interpreters, while at the other end of the spectrum things like enum->string conversions for printing, are less-so (and should probably be optimized for size). - some uses of switches do very different things in each of their case destinations (and use fallthrough, which must be exploited properly by the lowering), while others have very similar code in their destinations (the entire switch might actually be describing a mapping on data, rather than a control flow construct per se; e.g. a table lookup (dense indices) or binary search in a static table (sparse indices) might be a great lowering). - some switches have multiple case values that map to each destination while others have a 1-1 mapping. Those are some ways that switch's can vary off the top of my head. I think it would be really valuable if somebody who has easy access to compiling a large amount of real-world code with a custom compiler and collect some data (my understanding is that Google's internal codebase is a good candidate). -- Sean Silva On Tue, Dec 16, 2014 at 12:17 PM, Sean Silva <chisophugis at gmail.com> wrote:> > > > On Tue, Dec 16, 2014 at 7:23 AM, Chad Rosier <mcrosier at codeaurora.org> > wrote: >> >> > On Mon, Dec 15, 2014 at 2:26 PM, Chad Rosier <mcrosier at codeaurora.org> >> > wrote: >> >> >> >> > On Mon, Dec 15, 2014 at 9:57 AM, Chad Rosier < >> mcrosier at codeaurora.org> >> >> > wrote: >> >> >> All, >> >> >> About two months ago I posted a patch that hoisted the hottest case >> >> >> statement from a switch statement during ISelLowering. >> >> >> >> >> >> See: http://reviews.llvm.org/D5786 >> >> >> >> >> >> Sean was rather adamant about using a Huffman tree (and I agree this >> >> is >> >> >> a >> >> >> more complete solution), so I'd like to put a patch together. >> >> > >> >> > I think this sounds really cool! >> >> > >> >> >> That being >> >> >> said, I have a few questions. >> >> >> >> >> >> The current switch lowering code sorts based on case values and is >> >> >> lowered >> >> >> with a combination of binary trees, lookup tables, bit tests and >> >> magic. >> >> >> If we lower the switch based on branch probabilities, I think the >> >> most >> >> >> reasonable approach would be to disable the lookup tables and stick >> >> with >> >> >> a >> >> >> purely tree structure. Does anyone object or have opinions on this >> >> >> matter? >> >> > >> >> > Spontaneously I would have thought the Huffman tree would just >> replace >> >> > the binary trees, i.e. we'd use the branch probabilities to build the >> >> > tree differently. >> >> >> >> The current implementation selects the pivot in such a way that it >> >> optimizes >> >> for lookup tables. >> > >> > >> > This seems like a potential point of friction: for lookup tables (and >> > other >> > techniques) you want to inherently keep the cases sorted by their >> values, >> > but a Huffman tree does not care about the actual values; it only cares >> > about their relative probabilities. >> >> Exactly! I can think of lots of ways to deal with this "friction", but >> none of them sit well. As you mentioned, building a Huffman tree and then >> dealing with corner cases would be one approach. Another, which was >> Owen's suggestion, would be to peel the hot cases and then fall-back to >> the current logic. >> > > Another possibility might be to use a hybrid approach. > > For example, while building the Huffman tree, propagate various > information up the tree as you build it. Then you can very easily start at > the top of the huffman tree and have all this information available to you > (e.g. "peel off the most probable cases until the relative probabilty has > some relation to the subtree case density"). > > On the other hand (more likely), you could keep the cases ordered (i.e. > not a Huffman tree) and integrate the branch probability info as the extra > data propagated up the tree (along with other stuff); e.g. the max > probability of any node in the subtree, the total probability sum in the > subtree. This would also allow some highly non-trivial properties, like > "maximum number of consecutive cases observed in each subtree" to be > maintained, along with O(1) access to the root of the subtree containing > the consecutive cases (and if we keep the tree balanced, O(log n) time to > extract the consecutive cases and update all data we are propagating up the > tree). This would allow a really clean lowering algorithm like: > > while (we have subtrees with property X) > extract and handle subtree in a particular way > while (we have subtrees with property Y) > extract and handle subtree in a different way > etc. > > A prototype of this can be done really easily with a non-balanced binary > tree. In order to make it reliably fast the binary tree with need to be > upgraded to a balanced binary tree (red-black, AVL, treap, etc.), which I > will go ahead and volunteer to do (I've actually spent more time than I > care to admit investigating balanced binary trees and their implementation). > > For a textbook reference on maintaining extra data in a tree, see e.g. > section 14.2 of CLRS. > > -- Sean Silva > > >> >> > -- Sean Silva >> > >> > >> >> Lowering the binary tree based on branch probabilities >> >> doesn't fit well with this approach. However, I think we could start >> by >> >> building a Huffman tree and once we've handled the most heavily >> weighted >> >> nodes, >> >> we could fall back to the current logic. Make sense? >> >> >> >> > For how many hot cases is a Huffman tree faster than a jump table? I >> >> > suspect we'll still want to build jump tables when we can. >> >> >> >> I take back my previous comment; I agree we should have a hybrid >> >> approach, >> >> per my comments above. >> >> >> >> > Thanks, >> >> > Hans >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> > >> >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141222/1edc3fda/attachment.html>
> After messing around with this a bit recently, I've come to the following > conclusions: > > 1. One issue is at what granularity to consider the PGO weightings. For > example, should a set of 50 consecutive cases with 65% of the total weight > distributed approximately evenly and which can be lowered to a table > lookup > be handled before the 3 remaining cases that 5%, 10%, and 20% probability, > respectively?It seems reasonable to consider the lookup table as the 'hottest' case (per your next comment).> 2. Rather than saying "hot case" to mean the case with the most weight, we > might want to talk about a different "unit" in which to aggregate the > weights (such as in 2., considering the whole lookup table as one).I agree with this point.> 3. The primary advantages of a tree approach I suggested are likely to > surround being able to have a more "global" picture of the set of cases at > each step in the lowering and do all the computation up front. > > In a larger scope, I'm convinced that in order to make appropriate > tradeoffs, we probably need to analyze lots of real-world code.Agreed.> For example: > > - some use cases of `switch` are extremely performance-critical, like > lexers/interpreters, while at the other end of the spectrum things like > enum->string conversions for printing, are less-so (and should probably be > optimized for size). > - some uses of switches do very different things in each of their case > destinations (and use fallthrough, which must be exploited properly by the > lowering), while others have very similar code in their destinations (the > entire switch might actually be describing a mapping on data, rather than > a > control flow construct per se; e.g. a table lookup (dense indices) or > binary search in a static table (sparse indices) might be a great > lowering). > - some switches have multiple case values that map to each destination > while others have a 1-1 mapping. > > Those are some ways that switch's can vary off the top of my head. I think > it would be really valuable if somebody who has easy access to compiling a > large amount of real-world code with a custom compiler and collect some > data (my understanding is that Google's internal codebase is a good > candidate). >(Sadly) My world consists of SPEC2K and SPEC2K6, so I don't have must to offer here. :/> > -- Sean Silva > > On Tue, Dec 16, 2014 at 12:17 PM, Sean Silva <chisophugis at gmail.com> > wrote: >> >> >> >> On Tue, Dec 16, 2014 at 7:23 AM, Chad Rosier <mcrosier at codeaurora.org> >> wrote: >>> >>> > On Mon, Dec 15, 2014 at 2:26 PM, Chad Rosier >>> <mcrosier at codeaurora.org> >>> > wrote: >>> >> >>> >> > On Mon, Dec 15, 2014 at 9:57 AM, Chad Rosier < >>> mcrosier at codeaurora.org> >>> >> > wrote: >>> >> >> All, >>> >> >> About two months ago I posted a patch that hoisted the hottest >>> case >>> >> >> statement from a switch statement during ISelLowering. >>> >> >> >>> >> >> See: http://reviews.llvm.org/D5786 >>> >> >> >>> >> >> Sean was rather adamant about using a Huffman tree (and I agree >>> this >>> >> is >>> >> >> a >>> >> >> more complete solution), so I'd like to put a patch together. >>> >> > >>> >> > I think this sounds really cool! >>> >> > >>> >> >> That being >>> >> >> said, I have a few questions. >>> >> >> >>> >> >> The current switch lowering code sorts based on case values and >>> is >>> >> >> lowered >>> >> >> with a combination of binary trees, lookup tables, bit tests and >>> >> magic. >>> >> >> If we lower the switch based on branch probabilities, I think the >>> >> most >>> >> >> reasonable approach would be to disable the lookup tables and >>> stick >>> >> with >>> >> >> a >>> >> >> purely tree structure. Does anyone object or have opinions on >>> this >>> >> >> matter? >>> >> > >>> >> > Spontaneously I would have thought the Huffman tree would just >>> replace >>> >> > the binary trees, i.e. we'd use the branch probabilities to build >>> the >>> >> > tree differently. >>> >> >>> >> The current implementation selects the pivot in such a way that it >>> >> optimizes >>> >> for lookup tables. >>> > >>> > >>> > This seems like a potential point of friction: for lookup tables (and >>> > other >>> > techniques) you want to inherently keep the cases sorted by their >>> values, >>> > but a Huffman tree does not care about the actual values; it only >>> cares >>> > about their relative probabilities. >>> >>> Exactly! I can think of lots of ways to deal with this "friction", but >>> none of them sit well. As you mentioned, building a Huffman tree and >>> then >>> dealing with corner cases would be one approach. Another, which was >>> Owen's suggestion, would be to peel the hot cases and then fall-back to >>> the current logic. >>> >> >> Another possibility might be to use a hybrid approach. >> >> For example, while building the Huffman tree, propagate various >> information up the tree as you build it. Then you can very easily start >> at >> the top of the huffman tree and have all this information available to >> you >> (e.g. "peel off the most probable cases until the relative probabilty >> has >> some relation to the subtree case density"). >> >> On the other hand (more likely), you could keep the cases ordered (i.e. >> not a Huffman tree) and integrate the branch probability info as the >> extra >> data propagated up the tree (along with other stuff); e.g. the max >> probability of any node in the subtree, the total probability sum in the >> subtree. This would also allow some highly non-trivial properties, like >> "maximum number of consecutive cases observed in each subtree" to be >> maintained, along with O(1) access to the root of the subtree containing >> the consecutive cases (and if we keep the tree balanced, O(log n) time >> to >> extract the consecutive cases and update all data we are propagating up >> the >> tree). This would allow a really clean lowering algorithm like: >> >> while (we have subtrees with property X) >> extract and handle subtree in a particular way >> while (we have subtrees with property Y) >> extract and handle subtree in a different way >> etc. >> >> A prototype of this can be done really easily with a non-balanced binary >> tree. In order to make it reliably fast the binary tree with need to be >> upgraded to a balanced binary tree (red-black, AVL, treap, etc.), which >> I >> will go ahead and volunteer to do (I've actually spent more time than I >> care to admit investigating balanced binary trees and their >> implementation). >> >> For a textbook reference on maintaining extra data in a tree, see e.g. >> section 14.2 of CLRS. >> >> -- Sean Silva >> >> >>> >>> > -- Sean Silva >>> > >>> > >>> >> Lowering the binary tree based on branch probabilities >>> >> doesn't fit well with this approach. However, I think we could >>> start >>> by >>> >> building a Huffman tree and once we've handled the most heavily >>> weighted >>> >> nodes, >>> >> we could fall back to the current logic. Make sense? >>> >> >>> >> > For how many hot cases is a Huffman tree faster than a jump table? >>> I >>> >> > suspect we'll still want to build jump tables when we can. >>> >> >>> >> I take back my previous comment; I agree we should have a hybrid >>> >> approach, >>> >> per my comments above. >>> >> >>> >> > Thanks, >>> >> > Hans >>> >> >>> >> >>> >> >>> >> >>> >> _______________________________________________ >>> >> LLVM Developers mailing list >>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >> >>> > >>> >>> >>> >
On Tue, Dec 23, 2014 at 8:26 AM, Chad Rosier <mcrosier at codeaurora.org> wrote:> > After messing around with this a bit recently, I've come to the following > > conclusions: > > > > 1. One issue is at what granularity to consider the PGO weightings. For > > example, should a set of 50 consecutive cases with 65% of the total > weight > > distributed approximately evenly and which can be lowered to a table > > lookup > > be handled before the 3 remaining cases that 5%, 10%, and 20% > probability, > > respectively? > > It seems reasonable to consider the lookup table as the 'hottest' case > (per your next comment). > > > 2. Rather than saying "hot case" to mean the case with the most weight, > we > > might want to talk about a different "unit" in which to aggregate the > > weights (such as in 2., considering the whole lookup table as one). > > I agree with this point. > > > 3. The primary advantages of a tree approach I suggested are likely to > > surround being able to have a more "global" picture of the set of cases > at > > each step in the lowering and do all the computation up front. > > > > In a larger scope, I'm convinced that in order to make appropriate > > tradeoffs, we probably need to analyze lots of real-world code. > > Agreed. > > > For example: > > > > - some use cases of `switch` are extremely performance-critical, like > > lexers/interpreters, while at the other end of the spectrum things like > > enum->string conversions for printing, are less-so (and should probably > be > > optimized for size). > > - some uses of switches do very different things in each of their case > > destinations (and use fallthrough, which must be exploited properly by > the > > lowering), while others have very similar code in their destinations (the > > entire switch might actually be describing a mapping on data, rather than > > a > > control flow construct per se; e.g. a table lookup (dense indices) or > > binary search in a static table (sparse indices) might be a great > > lowering). > > - some switches have multiple case values that map to each destination > > while others have a 1-1 mapping. > > > > Those are some ways that switch's can vary off the top of my head. I > think > > it would be really valuable if somebody who has easy access to compiling > a > > large amount of real-world code with a custom compiler and collect some > > data (my understanding is that Google's internal codebase is a good > > candidate). > > > > (Sadly) My world consists of SPEC2K and SPEC2K6, so I don't have must to > offer here. :/ >SPEC might not be too bad for this particular investigation (or at least one part of it), since it hits a lot of the main "hot" situations that we care about (lexers, compression, interpreters, etc.). It will likely underrepresent the importance of many of the "cold" switches where we likely want focus on code size and not polluting the i-cache. Regardless, I would say: the more data the better! -- Sean Silva> > > > > -- Sean Silva > > > > On Tue, Dec 16, 2014 at 12:17 PM, Sean Silva <chisophugis at gmail.com> > > wrote: > >> > >> > >> > >> On Tue, Dec 16, 2014 at 7:23 AM, Chad Rosier <mcrosier at codeaurora.org> > >> wrote: > >>> > >>> > On Mon, Dec 15, 2014 at 2:26 PM, Chad Rosier > >>> <mcrosier at codeaurora.org> > >>> > wrote: > >>> >> > >>> >> > On Mon, Dec 15, 2014 at 9:57 AM, Chad Rosier < > >>> mcrosier at codeaurora.org> > >>> >> > wrote: > >>> >> >> All, > >>> >> >> About two months ago I posted a patch that hoisted the hottest > >>> case > >>> >> >> statement from a switch statement during ISelLowering. > >>> >> >> > >>> >> >> See: http://reviews.llvm.org/D5786 > >>> >> >> > >>> >> >> Sean was rather adamant about using a Huffman tree (and I agree > >>> this > >>> >> is > >>> >> >> a > >>> >> >> more complete solution), so I'd like to put a patch together. > >>> >> > > >>> >> > I think this sounds really cool! > >>> >> > > >>> >> >> That being > >>> >> >> said, I have a few questions. > >>> >> >> > >>> >> >> The current switch lowering code sorts based on case values and > >>> is > >>> >> >> lowered > >>> >> >> with a combination of binary trees, lookup tables, bit tests and > >>> >> magic. > >>> >> >> If we lower the switch based on branch probabilities, I think the > >>> >> most > >>> >> >> reasonable approach would be to disable the lookup tables and > >>> stick > >>> >> with > >>> >> >> a > >>> >> >> purely tree structure. Does anyone object or have opinions on > >>> this > >>> >> >> matter? > >>> >> > > >>> >> > Spontaneously I would have thought the Huffman tree would just > >>> replace > >>> >> > the binary trees, i.e. we'd use the branch probabilities to build > >>> the > >>> >> > tree differently. > >>> >> > >>> >> The current implementation selects the pivot in such a way that it > >>> >> optimizes > >>> >> for lookup tables. > >>> > > >>> > > >>> > This seems like a potential point of friction: for lookup tables (and > >>> > other > >>> > techniques) you want to inherently keep the cases sorted by their > >>> values, > >>> > but a Huffman tree does not care about the actual values; it only > >>> cares > >>> > about their relative probabilities. > >>> > >>> Exactly! I can think of lots of ways to deal with this "friction", but > >>> none of them sit well. As you mentioned, building a Huffman tree and > >>> then > >>> dealing with corner cases would be one approach. Another, which was > >>> Owen's suggestion, would be to peel the hot cases and then fall-back to > >>> the current logic. > >>> > >> > >> Another possibility might be to use a hybrid approach. > >> > >> For example, while building the Huffman tree, propagate various > >> information up the tree as you build it. Then you can very easily start > >> at > >> the top of the huffman tree and have all this information available to > >> you > >> (e.g. "peel off the most probable cases until the relative probabilty > >> has > >> some relation to the subtree case density"). > >> > >> On the other hand (more likely), you could keep the cases ordered (i.e. > >> not a Huffman tree) and integrate the branch probability info as the > >> extra > >> data propagated up the tree (along with other stuff); e.g. the max > >> probability of any node in the subtree, the total probability sum in the > >> subtree. This would also allow some highly non-trivial properties, like > >> "maximum number of consecutive cases observed in each subtree" to be > >> maintained, along with O(1) access to the root of the subtree containing > >> the consecutive cases (and if we keep the tree balanced, O(log n) time > >> to > >> extract the consecutive cases and update all data we are propagating up > >> the > >> tree). This would allow a really clean lowering algorithm like: > >> > >> while (we have subtrees with property X) > >> extract and handle subtree in a particular way > >> while (we have subtrees with property Y) > >> extract and handle subtree in a different way > >> etc. > >> > >> A prototype of this can be done really easily with a non-balanced binary > >> tree. In order to make it reliably fast the binary tree with need to be > >> upgraded to a balanced binary tree (red-black, AVL, treap, etc.), which > >> I > >> will go ahead and volunteer to do (I've actually spent more time than I > >> care to admit investigating balanced binary trees and their > >> implementation). > >> > >> For a textbook reference on maintaining extra data in a tree, see e.g. > >> section 14.2 of CLRS. > >> > >> -- Sean Silva > >> > >> > >>> > >>> > -- Sean Silva > >>> > > >>> > > >>> >> Lowering the binary tree based on branch probabilities > >>> >> doesn't fit well with this approach. However, I think we could > >>> start > >>> by > >>> >> building a Huffman tree and once we've handled the most heavily > >>> weighted > >>> >> nodes, > >>> >> we could fall back to the current logic. Make sense? > >>> >> > >>> >> > For how many hot cases is a Huffman tree faster than a jump table? > >>> I > >>> >> > suspect we'll still want to build jump tables when we can. > >>> >> > >>> >> I take back my previous comment; I agree we should have a hybrid > >>> >> approach, > >>> >> per my comments above. > >>> >> > >>> >> > Thanks, > >>> >> > Hans > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> _______________________________________________ > >>> >> LLVM Developers mailing list > >>> >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> >> > >>> > > >>> > >>> > >>> > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141223/b4f08104/attachment.html>