Witold Waligora via llvm-dev
2017-Feb-14 13:28 UTC
[llvm-dev] (RFC) JumpMaps: switch statement optimization
JumpMap lowering is nearly identical to that of JumpTables with the exception of lack of range-check basic-block. We introduce JumpMapInfo structure which follows the same flow as JumpTableInfo and is finally emitted by AsmPrinter. There are many ways a Target may want to encode jumpmaps (deltas, compression, relative vs absolute), so we plan to keep this flexible and target-driven when upstreaming. Witold W dniu 2017-02-14 o 10:53, Nema, Ashutosh pisze:> Hi Witold, > > Can you explain what you meant by adding support in compiler_rt & changes in lowering. > > Did you tried any benchmarks, please share if you have any numbers with performance improvements & code size reduction ? > > Regards, > Ashutosh > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Witold Waligora via llvm-dev > Sent: Monday, February 13, 2017 6:41 PM > To: llvm-dev at lists.llvm.org > Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization > > Hi All, > > This is our first appearance on llvm-dev so a small introduction is in order. > We are a team of two working for good few years now on an out-of-tree embedded back-end. Our primary goal is code size, secondary performance. > We've recently implemented an optimization that we believe others could use. It's about time to contribute back. > > ------------------------------------------------------- > JumpMaps: a generalization of JumpTables JumpTables produce fast and small code but have a limitation - they only work if case values are easy to compute from the variable being switch()ed. To overcome this limitation we introduce a {key,value} structure - a JumpMap. > > Simplifying somewhat, LLVM would currently generate a sequential > if-elseif- for small switches and an inline binary-search for large switches, if it can't produce a JumpTable. JumpMaps would instead generate a call followed by a jump-through-register. > Pseudo-code example: > > switch(x) { > case 0x6851: {BB1} > case 0x1383: {BB2} > case 0x2224: {BB3} > default: {defaultBB} > } > > Without Jump Maps: > if(x==0x6851) { > goto BB1 > } > else if(x==0x1383) { > goto BB2 > } > else if(x==0x2224){ > goto BB3 > } > else{ > goto defaultBB > } > > With Jump Maps: > jumpmap_0 = { > keys = {3, 0x6851, 0x1383, 0x2224} > vals = {defaultBB, BB1, BB2, BB3} > } > addr dst = __jumpmap_find(&jumpmap_0, x) > goto dst; > > On our target Jump Maps produce both smaller and faster code even for quite small switch statements. We believe other architectures would benefit as well, X86 and ARM included: > - jumpmap struct has a good chance of being cached more efficiently than a large chunk of inline binary-search code > - for large switches branch prediction should have an easier time compared to inline binary-search > - embedded architectures (ARM, MSP430) would benefit from smaller code (trading off .text for .rodata) > > In terms of lowering, JumpMap flow is very similar to that of JumpTables. The lowering is even slightly easier as JumpMaps do not require an extra range-check basic-block - the default case is also handled by __jumpmap_find. > Targets need to provide lowering for the map structure and __jumpmap_find function, which would typically become a part of compier-rt library and could be written in hand-crafted asm for extra speed. > > > Questions or comments? > Please let me know. > > > Best Regards, > Witold Waligóra > Myre Laboratories > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Witold Waligora via llvm-dev
2017-Feb-14 13:47 UTC
[llvm-dev] (RFC) JumpMaps: switch statement optimization
I didn't answer compiler-rt and benchmarks question. compiler-rt would have to implement the decoding function(s) to undo whatever the compiler did. This would be target-dependent. Our target implements four of them, one for each basic type: jumpmap_find_i8, jumpmap_find_i16, jumpmap_find_i32, jumpmap_find_i64. We don't have any benchmarks for any of the in-tree targets yet. Witold W dniu 2017-02-14 o 14:28, Witold Waligora via llvm-dev pisze:> JumpMap lowering is nearly identical to that of JumpTables with the > exception of lack of range-check basic-block. > We introduce JumpMapInfo structure which follows the same flow as > JumpTableInfo and is finally emitted by AsmPrinter. > > There are many ways a Target may want to encode jumpmaps (deltas, > compression, relative vs absolute), so we plan to keep this flexible and > target-driven when upstreaming. > > Witold > > W dniu 2017-02-14 o 10:53, Nema, Ashutosh pisze: >> Hi Witold, >> >> Can you explain what you meant by adding support in compiler_rt & changes in lowering. >> >> Did you tried any benchmarks, please share if you have any numbers with performance improvements & code size reduction ? >> >> Regards, >> Ashutosh >> >> -----Original Message----- >> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Witold Waligora via llvm-dev >> Sent: Monday, February 13, 2017 6:41 PM >> To: llvm-dev at lists.llvm.org >> Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization >> >> Hi All, >> >> This is our first appearance on llvm-dev so a small introduction is in order. >> We are a team of two working for good few years now on an out-of-tree embedded back-end. Our primary goal is code size, secondary performance. >> We've recently implemented an optimization that we believe others could use. It's about time to contribute back. >> >> ------------------------------------------------------- >> JumpMaps: a generalization of JumpTables JumpTables produce fast and small code but have a limitation - they only work if case values are easy to compute from the variable being switch()ed. To overcome this limitation we introduce a {key,value} structure - a JumpMap. >> >> Simplifying somewhat, LLVM would currently generate a sequential >> if-elseif- for small switches and an inline binary-search for large switches, if it can't produce a JumpTable. JumpMaps would instead generate a call followed by a jump-through-register. >> Pseudo-code example: >> >> switch(x) { >> case 0x6851: {BB1} >> case 0x1383: {BB2} >> case 0x2224: {BB3} >> default: {defaultBB} >> } >> >> Without Jump Maps: >> if(x==0x6851) { >> goto BB1 >> } >> else if(x==0x1383) { >> goto BB2 >> } >> else if(x==0x2224){ >> goto BB3 >> } >> else{ >> goto defaultBB >> } >> >> With Jump Maps: >> jumpmap_0 = { >> keys = {3, 0x6851, 0x1383, 0x2224} >> vals = {defaultBB, BB1, BB2, BB3} >> } >> addr dst = __jumpmap_find(&jumpmap_0, x) >> goto dst; >> >> On our target Jump Maps produce both smaller and faster code even for quite small switch statements. We believe other architectures would benefit as well, X86 and ARM included: >> - jumpmap struct has a good chance of being cached more efficiently than a large chunk of inline binary-search code >> - for large switches branch prediction should have an easier time compared to inline binary-search >> - embedded architectures (ARM, MSP430) would benefit from smaller code (trading off .text for .rodata) >> >> In terms of lowering, JumpMap flow is very similar to that of JumpTables. The lowering is even slightly easier as JumpMaps do not require an extra range-check basic-block - the default case is also handled by __jumpmap_find. >> Targets need to provide lowering for the map structure and __jumpmap_find function, which would typically become a part of compier-rt library and could be written in hand-crafted asm for extra speed. >> >> >> Questions or comments? >> Please let me know. >> >> >> Best Regards, >> Witold Waligóra >> Myre Laboratories >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Hans Wennborg via llvm-dev
2017-Feb-14 18:38 UTC
[llvm-dev] (RFC) JumpMaps: switch statement optimization
I wonder if it would make sense to emit the jumpmap_find_ functions in IR rather than in compiler-rt. Many targets don't link against compiler-rt, e.g. x86 Linux typically uses libgcc. If they're emitted in IR, the functions could potentially be inlined. For example if the size of the switch is known to be small so no binary search is done, could inlining the find_ function be smaller than setting up the call? On Tue, Feb 14, 2017 at 5:47 AM, Witold Waligora via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I didn't answer compiler-rt and benchmarks question. > > compiler-rt would have to implement the decoding function(s) to undo > whatever the compiler did. This would be target-dependent. > Our target implements four of them, one for each basic type: > jumpmap_find_i8, jumpmap_find_i16, jumpmap_find_i32, jumpmap_find_i64. > > We don't have any benchmarks for any of the in-tree targets yet. > > Witold > > > W dniu 2017-02-14 o 14:28, Witold Waligora via llvm-dev pisze: >> JumpMap lowering is nearly identical to that of JumpTables with the >> exception of lack of range-check basic-block. >> We introduce JumpMapInfo structure which follows the same flow as >> JumpTableInfo and is finally emitted by AsmPrinter. >> >> There are many ways a Target may want to encode jumpmaps (deltas, >> compression, relative vs absolute), so we plan to keep this flexible and >> target-driven when upstreaming. >> >> Witold >> >> W dniu 2017-02-14 o 10:53, Nema, Ashutosh pisze: >>> Hi Witold, >>> >>> Can you explain what you meant by adding support in compiler_rt & changes in lowering. >>> >>> Did you tried any benchmarks, please share if you have any numbers with performance improvements & code size reduction ? >>> >>> Regards, >>> Ashutosh >>> >>> -----Original Message----- >>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Witold Waligora via llvm-dev >>> Sent: Monday, February 13, 2017 6:41 PM >>> To: llvm-dev at lists.llvm.org >>> Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization >>> >>> Hi All, >>> >>> This is our first appearance on llvm-dev so a small introduction is in order. >>> We are a team of two working for good few years now on an out-of-tree embedded back-end. Our primary goal is code size, secondary performance. >>> We've recently implemented an optimization that we believe others could use. It's about time to contribute back. >>> >>> ------------------------------------------------------- >>> JumpMaps: a generalization of JumpTables JumpTables produce fast and small code but have a limitation - they only work if case values are easy to compute from the variable being switch()ed. To overcome this limitation we introduce a {key,value} structure - a JumpMap. >>> >>> Simplifying somewhat, LLVM would currently generate a sequential >>> if-elseif- for small switches and an inline binary-search for large switches, if it can't produce a JumpTable. JumpMaps would instead generate a call followed by a jump-through-register. >>> Pseudo-code example: >>> >>> switch(x) { >>> case 0x6851: {BB1} >>> case 0x1383: {BB2} >>> case 0x2224: {BB3} >>> default: {defaultBB} >>> } >>> >>> Without Jump Maps: >>> if(x==0x6851) { >>> goto BB1 >>> } >>> else if(x==0x1383) { >>> goto BB2 >>> } >>> else if(x==0x2224){ >>> goto BB3 >>> } >>> else{ >>> goto defaultBB >>> } >>> >>> With Jump Maps: >>> jumpmap_0 = { >>> keys = {3, 0x6851, 0x1383, 0x2224} >>> vals = {defaultBB, BB1, BB2, BB3} >>> } >>> addr dst = __jumpmap_find(&jumpmap_0, x) >>> goto dst; >>> >>> On our target Jump Maps produce both smaller and faster code even for quite small switch statements. We believe other architectures would benefit as well, X86 and ARM included: >>> - jumpmap struct has a good chance of being cached more efficiently than a large chunk of inline binary-search code >>> - for large switches branch prediction should have an easier time compared to inline binary-search >>> - embedded architectures (ARM, MSP430) would benefit from smaller code (trading off .text for .rodata) >>> >>> In terms of lowering, JumpMap flow is very similar to that of JumpTables. The lowering is even slightly easier as JumpMaps do not require an extra range-check basic-block - the default case is also handled by __jumpmap_find. >>> Targets need to provide lowering for the map structure and __jumpmap_find function, which would typically become a part of compier-rt library and could be written in hand-crafted asm for extra speed. >>> >>> >>> Questions or comments? >>> Please let me know. >>> >>> >>> Best Regards, >>> Witold Waligóra >>> Myre Laboratories >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Possibly Parallel Threads
- (RFC) JumpMaps: switch statement optimization
- (RFC) JumpMaps: switch statement optimization
- [LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
- [LLVMdev] patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation
- error in plot.dendrogram (PR#7300)