thr3ads.net - llvm dev - [LLVMdev] some superoptimizer results [Jul 2015]

If this information is useful, please help other people find it:
Share via:

John Regehr

2015-Jul-23 06:33 UTC

[LLVMdev] some superoptimizer results

> Interesting. Do you happen to have some examples laying around?
Sorry, no, I'll have to re-run. I felt that the select-heavy results were 
sort of humorous and clever at first, but then just annoying.
> Except for decode-limited situations, in general decreasing the critical
path length is more important
> than eliminating instructions. The critical path length is a
target-independent lower bound on the
> maximum achievable execution speed; the number of instructions is not.
Sounds easy enough. I'll give it a try. I've observed many situations 
where Souper can shorten the critical path but I hadn't really thought 
about making that into a top-level goal.

Thanks,

John

Sean Silva

2015-Jul-23 10:28 UTC

head link

[LLVMdev] some superoptimizer results

On Wed, Jul 22, 2015 at 11:33 PM, John Regehr <regehr at cs.utah.edu>
wrote:
> Interesting. Do you happen to have some examples laying around?
>>
>
> Sorry, no, I'll have to re-run. I felt that the select-heavy results
were
> sort of humorous and clever at first, but then just annoying.
>
>  Except for decode-limited situations, in general decreasing the critical
>> path length is more important
>> than eliminating instructions. The critical path length is a
>> target-independent lower bound on the
>> maximum achievable execution speed; the number of instructions is not.
>>
>
> Sounds easy enough. I'll give it a try. I've observed many
situations
> where Souper can shorten the critical path but I hadn't really thought
> about making that into a top-level goal.
>
What I said is sort of misleading in that the targets have a variety of
"exotic" instructions that can effectively collapse multiple nodes in
the
critical path e.g. x86 BMI, PPC rlwinm, etc. Even stuff like whether the
target has ANDN will affect the critical path (or an instruction taking a
general truth table). Also the precise cost in a critical path of the
individual instructions can vary a fair amount.to the extent that except
for getting rid of mul's or div's it's not a universal win (even
with mul,
there are often situations where the mul latency doesn't cost you anything,
so the mul is effectively "cheap as an add").

The above notwithstanding, the critical path is still likely to be a better
indicator than number than instructions. Ultimately the hardware
implementation of these instructions is critical-path-limited, so besides
actually reducing the critical path of instructions, optimizing for
critical path is more likely to land you near something which might
actually be a single cycle instruction in the hardware. That's my
intuition, at least.


I guess another way to select interesting transformations could be to look
for sequences where the result uses a "subset" of the input
instruction
sequence. E.g. if the input instruction sequence involves a CMP, then
reducing the input sequence to a single CMP is a win. If the input contain
a shift-imm, then reducing the entire sequence to a shift-imm
("absorbing"
into it) is a win. Maybe there is a generalization of this intuition to
multiple-instruction result sequences?

%0:i32 = var
%1:i32 = addnw 1:i32, %0
%2:i32 = shl 1:i32, %1
infer %2
%3:i32 = shl 2:i32, %0
result %3

%0:i8 = var
%1:i1 = ult 191:i8, %0
%2:i1 = slt 255:i8, %0
%3:i1 = or %1, %2
infer %3
%4:i1 = sle 192:i8, %0
result %4

%0:i32 = var
%1:i32 = shlnsw %0, 1:i32
%2:i32 = or 1:i32, %1
%3:i1 = slt %1, %2
infer %3
result 1:i1

"or" and "and" are basically interchangeable as far as cost
is concerned,
so this one also seems like a "subset":

%0:i32 = var
%1:i32 = addnsw 4294967295:i32, %0
%2:i32 = var
%3:i32 = or 1:i32, %2
%4:i32 = addnsw %1, %3
infer %4
%5:i32 = and 4294967294:i32, %2
%6:i32 = add %0, %5
result %6

-- Sean Silva

>
> Thanks,
>
> John
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/6b9f136d/attachment.html>

John Regehr

2015-Jul-23 14:24 UTC

head link

[LLVMdev] some superoptimizer results

> I guess another way to select interesting transformations could be to look
for sequences where the
> result uses a "subset" of the input instruction sequence.
Yeah, I had been noticing those subsets too.  It sounds like it's worth a 
try doing a run that looks only for those.

One nice thing is that if we're just looking for subsets, we will not even 
give the syntehsizer the option to use instructions that aren't in the 
subset -- this should give a nice speedup, and probably will even make 
synthesis more likely to succeed (assuming that an optimization exists 
within the subset).

John

John Regehr

2015-Jul-23 14:55 UTC

head link

[LLVMdev] some superoptimizer results

One last cost function idea: the cost is the number of inputs. So here the 
LHS has cost 1 and the RHS has cost 0:

%0:i32 = var
%1:i32 = shlnsw %0, 1:i32
%2:i32 = or 1:i32, %1
%3:i1 = slt %1, %2
infer %3
result 1:i1

The rationale, of course, is that these are likely to enable subsequent 
optimizations.  There aren't that many of these so perhaps a separate run 
would be reasonable?

John

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Jul 2015 - [LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

Apparently Analagous Threads