thr3ads.net - llvm dev - [llvm-dev] (RFC) JumpMaps: switch statement optimization [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Witold Waligora via llvm-dev

2017-Feb-13 13:11 UTC

[llvm-dev] (RFC) JumpMaps: switch statement optimization

Hi All,

This is our first appearance on llvm-dev so a small introduction is in
order.
We are a team of two working for good few years now on an out-of-tree
embedded back-end. Our primary goal is code size, secondary performance.
We've recently implemented an optimization that we believe others could
use. It's about time to contribute back.

-------------------------------------------------------
JumpMaps: a generalization of JumpTables
JumpTables produce fast and small code but have a limitation - they only
work if case values are easy to compute from the variable being
switch()ed. To overcome this limitation we introduce a {key,value}
structure - a JumpMap.

Simplifying somewhat, LLVM would currently generate a sequential
if-elseif- for small switches and an inline binary-search for large
switches, if it can't produce a JumpTable. JumpMaps would instead
generate a call followed by a jump-through-register.
Pseudo-code example:

switch(x) {
  case 0x6851: {BB1}
  case 0x1383: {BB2}
  case 0x2224: {BB3}
  default:     {defaultBB}
}

Without Jump Maps:
  if(x==0x6851) {
    goto BB1
  }
  else if(x==0x1383) {
    goto BB2
  }
  else if(x==0x2224){
    goto BB3
  }
  else{
    goto defaultBB
  }

With Jump Maps:
  jumpmap_0 = {
    keys = {3, 0x6851, 0x1383, 0x2224}
    vals = {defaultBB, BB1, BB2, BB3}
  }
  addr dst = __jumpmap_find(&jumpmap_0, x)
  goto dst;

On our target Jump Maps produce both smaller and faster code even for
quite small switch statements. We believe other architectures would
benefit as well, X86 and ARM included:
- jumpmap struct has a good chance of being cached more efficiently than
a large chunk of inline binary-search code
- for large switches branch prediction should have an easier time
compared to inline binary-search
- embedded architectures (ARM, MSP430) would benefit from smaller code
(trading off .text for .rodata)

In terms of lowering, JumpMap flow is very similar to that of
JumpTables. The lowering is even slightly easier as JumpMaps do not
require an extra range-check basic-block - the default case is also
handled by __jumpmap_find.
Targets need to provide lowering for the map structure and
__jumpmap_find function, which would typically become a part of
compier-rt library and could be written in hand-crafted asm for extra speed.


Questions or comments?
Please let me know.


Best Regards,
Witold Waligóra
Myre Laboratories

Friedman, Eli via llvm-dev

2017-Feb-14 00:23 UTC

head link

[llvm-dev] (RFC) JumpMaps: switch statement optimization

On 2/13/2017 5:11 AM, Witold Waligora via llvm-dev
wrote:> Hi All,
>
> This is our first appearance on llvm-dev so a small introduction is in
> order.
> We are a team of two working for good few years now on an out-of-tree
> embedded back-end. Our primary goal is code size, secondary performance.
> We've recently implemented an optimization that we believe others could
> use. It's about time to contribute back.
>
> -------------------------------------------------------
> JumpMaps: a generalization of JumpTables
> JumpTables produce fast and small code but have a limitation - they only
> work if case values are easy to compute from the variable being
> switch()ed. To overcome this limitation we introduce a {key,value}
> structure - a JumpMap.
>
> Simplifying somewhat, LLVM would currently generate a sequential
> if-elseif- for small switches and an inline binary-search for large
> switches, if it can't produce a JumpTable. JumpMaps would instead
> generate a call followed by a jump-through-register.
> Pseudo-code example:
>
> switch(x) {
>    case 0x6851: {BB1}
>    case 0x1383: {BB2}
>    case 0x2224: {BB3}
>    default:     {defaultBB}
> }
>
> Without Jump Maps:
>    if(x==0x6851) {
>      goto BB1
>    }
>    else if(x==0x1383) {
>      goto BB2
>    }
>    else if(x==0x2224){
>      goto BB3
>    }
>    else{
>      goto defaultBB
>    }
>
> With Jump Maps:
>    jumpmap_0 = {
>      keys = {3, 0x6851, 0x1383, 0x2224}
>      vals = {defaultBB, BB1, BB2, BB3}
>    }
>    addr dst = __jumpmap_find(&jumpmap_0, x)
>    goto dst;
>
> On our target Jump Maps produce both smaller and faster code even for
> quite small switch statements. We believe other architectures would
> benefit as well, X86 and ARM included:
> - jumpmap struct has a good chance of being cached more efficiently than
> a large chunk of inline binary-search code
> - for large switches branch prediction should have an easier time
> compared to inline binary-search
> - embedded architectures (ARM, MSP430) would benefit from smaller code
> (trading off .text for .rodata)
>
> In terms of lowering, JumpMap flow is very similar to that of
> JumpTables. The lowering is even slightly easier as JumpMaps do not
> require an extra range-check basic-block - the default case is also
> handled by __jumpmap_find.
> Targets need to provide lowering for the map structure and
> __jumpmap_find function, which would typically become a part of
> compier-rt library and could be written in hand-crafted asm for extra
speed.
>
>
> Questions or comments?
> Please let me know.
>
Smaller codesize for switches could be interesting for Thumb targets.  
Some thoughts.

1. For Thumb, the overall size is more important that constants vs. 
code; the tables will end up in the text segment anyway.

2. How many versions of __jumpmap_find do you need?  If you're going for 
size, you probably want to put 1 or 2 byte relative jump offsets into 
the table, if possible.  You'd also want to shrink the keys if possible: 
for a small range, we could use 2-byte, or even 1-byte keys.  You could 
also encode deltas into the table, rather than absolute values; the 
deltas are likely to be smaller than the keys, and you're iterating over 
the table anyway.

3. Is the linear search a problem?  I mean, the linear nature of it 
doesn't matter for small tables, but eventually you get to a point where 
it just doesn't make sense.  Linearly iterating over 1000 entries in a 
table is clearly going to be slower than a binary search.  Or is the 
idea that you'd do a binary search down to say, 20 entries, then call 
__jumpmap_find?

4. How often does this pattern come up, in your experience?

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Joerg Sonnenberger via llvm-dev

2017-Feb-14 00:40 UTC

head link

[llvm-dev] (RFC) JumpMaps: switch statement optimization

On Mon, Feb 13, 2017 at 02:11:24PM +0100, Witold Waligora via llvm-dev
wrote:> JumpMaps: a generalization of JumpTables
> JumpTables produce fast and small code but have a limitation - they only
> work if case values are easy to compute from the variable being
> switch()ed. To overcome this limitation we introduce a {key,value}
> structure - a JumpMap.
We've been discussing supporting either perfect hashing or just a
pre-transformation a while ago. That would have a very similar use case,
the primary issue is the additional code for checking that the hashed
value is actually the expected value.

Joerg

Nema, Ashutosh via llvm-dev

2017-Feb-14 09:53 UTC

head link

[llvm-dev] (RFC) JumpMaps: switch statement optimization

Hi Witold,

Can you explain what you meant by adding support in compiler_rt & changes in
lowering.

Did you tried any benchmarks, please share if you have any numbers with
performance improvements & code size reduction ?

Regards,
Ashutosh

-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Witold
Waligora via llvm-dev
Sent: Monday, February 13, 2017 6:41 PM
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization

Hi All,

This is our first appearance on llvm-dev so a small introduction is in order.
We are a team of two working for good few years now on an out-of-tree embedded
back-end. Our primary goal is code size, secondary performance.
We've recently implemented an optimization that we believe others could use.
It's about time to contribute back.

-------------------------------------------------------
JumpMaps: a generalization of JumpTables JumpTables produce fast and small code
but have a limitation - they only work if case values are easy to compute from
the variable being switch()ed. To overcome this limitation we introduce a
{key,value} structure - a JumpMap.

Simplifying somewhat, LLVM would currently generate a sequential
if-elseif- for small switches and an inline binary-search for large switches, if
it can't produce a JumpTable. JumpMaps would instead generate a call
followed by a jump-through-register.
Pseudo-code example:

switch(x) {
  case 0x6851: {BB1}
  case 0x1383: {BB2}
  case 0x2224: {BB3}
  default:     {defaultBB}
}

Without Jump Maps:
  if(x==0x6851) {
    goto BB1
  }
  else if(x==0x1383) {
    goto BB2
  }
  else if(x==0x2224){
    goto BB3
  }
  else{
    goto defaultBB
  }

With Jump Maps:
  jumpmap_0 = {
    keys = {3, 0x6851, 0x1383, 0x2224}
    vals = {defaultBB, BB1, BB2, BB3}
  }
  addr dst = __jumpmap_find(&jumpmap_0, x)
  goto dst;

On our target Jump Maps produce both smaller and faster code even for quite
small switch statements. We believe other architectures would benefit as well,
X86 and ARM included:
- jumpmap struct has a good chance of being cached more efficiently than a large
chunk of inline binary-search code
- for large switches branch prediction should have an easier time compared to
inline binary-search
- embedded architectures (ARM, MSP430) would benefit from smaller code (trading
off .text for .rodata)

In terms of lowering, JumpMap flow is very similar to that of JumpTables. The
lowering is even slightly easier as JumpMaps do not require an extra range-check
basic-block - the default case is also handled by __jumpmap_find.
Targets need to provide lowering for the map structure and __jumpmap_find
function, which would typically become a part of compier-rt library and could be
written in hand-crafted asm for extra speed.


Questions or comments?
Please let me know.


Best Regards,
Witold Waligóra
Myre Laboratories
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Witold Waligora via llvm-dev

2017-Feb-14 13:17 UTC

head link

[llvm-dev] (RFC) JumpMaps: switch statement optimization

Eli,
> 2. How many versions of __jumpmap_find do you need? In our out-of-tree implementation we currently encode case values as-is,
thus we need 4 versions of __jumpmap_find function (i8,i16,i32,i64).
We have an analysis pass that helps us figure out which kinds of maps
are actually profitable.
For example, if we discover that we only have one i8 jumpmap but many
i16 jumpmaps, then we would zero-extend the i8 map to i16 so that we
don't use __jumpmap_find_i8 at all and it can be removed.
> If you're going for
> size, you probably want to put 1 or 2 byte relative jump offsets into
> the table, if possible.  You'd also want to shrink the keys if
possible:
> for a small range, we could use 2-byte, or even 1-byte keys.  You could
> also encode deltas into the table, rather than absolute values; the
> deltas are likely to be smaller than the keys, and you're iterating
over
> the table anyway.Yes, many options are possible. In general, as long as jumpmap_find()
function can undo what the compiler did to the map you'd be fine. We
plan to keep this opt flexible when upstreaming.
> 3. Is the linear search a problem?  I mean, the linear nature of it
> doesn't matter for small tables, but eventually you get to a point
where
> it just doesn't make sense.  Linearly iterating over 1000 entries in a
> table is clearly going to be slower than a binary search.  Or is the
> idea that you'd do a binary search down to say, 20 entries, then call
> __jumpmap_find?Our __jumpmap_find does binary-search internally and it turned out to be
both faster and smaller than what compiler would generate otherwise
(bittest clustering).
There is some concern about how well jumpmaps could integrate with
existing switch clustering code for a pure speed Target.
> 4. How often does this pattern come up, in your experience?This is highly dependent on how large switches will be profitable to
fold as a jumpmap on a particular target.
For us the break-even point is at 5 cases which is very small and fires
quite often.
I can't say for X86/speed, but as code size opt, this fires on any
switch statement >= 5 that fails to fold as a JumpTable.


Witold

Witold Waligora via llvm-dev

2017-Feb-14 13:28 UTC

head link

[llvm-dev] (RFC) JumpMaps: switch statement optimization

JumpMap lowering is nearly identical to that of JumpTables with the
exception of lack of range-check basic-block.
We introduce JumpMapInfo structure which follows the same flow as
JumpTableInfo and is finally emitted by AsmPrinter.

There are many ways a Target may want to encode jumpmaps (deltas,
compression, relative vs absolute), so we plan to keep this flexible and
target-driven when upstreaming.

Witold

W dniu 2017-02-14 o 10:53, Nema, Ashutosh pisze:> Hi Witold,
> 
> Can you explain what you meant by adding support in compiler_rt &
changes in lowering.
> 
> Did you tried any benchmarks, please share if you have any numbers with
performance improvements & code size reduction ?
> 
> Regards,
> Ashutosh
> 
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Witold Waligora via llvm-dev
> Sent: Monday, February 13, 2017 6:41 PM
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization
> 
> Hi All,
> 
> This is our first appearance on llvm-dev so a small introduction is in
order.
> We are a team of two working for good few years now on an out-of-tree
embedded back-end. Our primary goal is code size, secondary performance.
> We've recently implemented an optimization that we believe others could
use. It's about time to contribute back.
> 
> -------------------------------------------------------
> JumpMaps: a generalization of JumpTables JumpTables produce fast and small
code but have a limitation - they only work if case values are easy to compute
from the variable being switch()ed. To overcome this limitation we introduce a
{key,value} structure - a JumpMap.
> 
> Simplifying somewhat, LLVM would currently generate a sequential
> if-elseif- for small switches and an inline binary-search for large
switches, if it can't produce a JumpTable. JumpMaps would instead generate a
call followed by a jump-through-register.
> Pseudo-code example:
> 
> switch(x) {
>   case 0x6851: {BB1}
>   case 0x1383: {BB2}
>   case 0x2224: {BB3}
>   default:     {defaultBB}
> }
> 
> Without Jump Maps:
>   if(x==0x6851) {
>     goto BB1
>   }
>   else if(x==0x1383) {
>     goto BB2
>   }
>   else if(x==0x2224){
>     goto BB3
>   }
>   else{
>     goto defaultBB
>   }
> 
> With Jump Maps:
>   jumpmap_0 = {
>     keys = {3, 0x6851, 0x1383, 0x2224}
>     vals = {defaultBB, BB1, BB2, BB3}
>   }
>   addr dst = __jumpmap_find(&jumpmap_0, x)
>   goto dst;
> 
> On our target Jump Maps produce both smaller and faster code even for quite
small switch statements. We believe other architectures would benefit as well,
X86 and ARM included:
> - jumpmap struct has a good chance of being cached more efficiently than a
large chunk of inline binary-search code
> - for large switches branch prediction should have an easier time compared
to inline binary-search
> - embedded architectures (ARM, MSP430) would benefit from smaller code
(trading off .text for .rodata)
> 
> In terms of lowering, JumpMap flow is very similar to that of JumpTables.
The lowering is even slightly easier as JumpMaps do not require an extra
range-check basic-block - the default case is also handled by __jumpmap_find.
> Targets need to provide lowering for the map structure and __jumpmap_find
function, which would typically become a part of compier-rt library and could be
written in hand-crafted asm for extra speed.
> 
> 
> Questions or comments?
> Please let me know.
> 
> 
> Best Regards,
> Witold Waligóra
> Myre Laboratories
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Feb 2017 - (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization

[llvm-dev] (RFC) JumpMaps: switch statement optimization