thr3ads.net - llvm dev - [LLVMdev] Missed optimization with indirectbr terminator [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Carlo Alberto Ferraris

2011-Jul-08 05:15 UTC

[LLVMdev] Missed optimization with indirectbr terminator

Nella citazione giovedì 7 luglio 2011 19:41:16, John McCall ha
scritto:> On Jul 7, 2011, at 4:33 AM, Carlo Alberto Ferraris wrote:
>> Il 07/07/2011 11:14, Cameron Zwarich ha scritto:
>>> I haven't read the code in detail, but it looks like
JumpThreading at
>>> least attempts to thread across indirect branches. You can either
try
>>> to fix it or file a bug with your test case.
>> In the source it says "If the predecessor is an indirect goto, we 
>> can't split the edge. 
>>
<http://llvm.org/docs/doxygen/html/JumpThreading_8cpp_source.html#l00914>"
>> Why would that be the case?
> 
> Splitting an edge creates a block which executes when leaving a
> specific block to go to a specific successor. The only way to split an
> indirect goto is to insert code before the jump which checks for a
> specific destination address. This is very likely to be a pessimization.Do you mean like turning a 
indirectbr %addr, [%a, %b, ...]
into a 
switch %addr, undef, [blockaddr(%a), %a], [blo
ckaddr(%b), %b], ...
? (I know that the switch doesn't accept a pointer as first argument, 
it's just an example)
> To answer your original question, the current implementation design
> for indirect goto is intentionally based around having a single block
> that terminates in an indirectbr. Otherwise the CFG of a
> context-threaded interpreter loop becomes inherently O(N^2): there
> are N instruction labels, each of which has approximately N
> predecessors. We rely on a code-generation optimization, tail
> duplication, to actually restore the context-threading behavior.
> 
> In short, don't look at the IR for whether context-threading is
> happening; look at the generated code.I'll try to inspect the assembler. Just a quick thought in the mean 
time, in the snippet I posted, if the backedge pointed directly to %19, 
other optimizations would likely notice that the loop could be removed 
entirely and replaced with a single addition. Do you think the code 
generator is able t
o do this?

-- 
Carlo Alberto Ferraris <cafxx at strayorange.com 
<mailto:cafxx at strayorange.com>>
website/blog <http://cafxx.strayorange.com> - +39 333 7643 235

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cafxx.vcf
Type: text/x-vcard
Size: 233 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110708/88dfa186/attachment.vcf>

John McCall

2011-Jul-08 07:05 UTC

head link

[LLVMdev] Missed optimization with indirectbr terminator

On Jul 7, 2011, at 10:15 PM, Carlo Alberto Ferraris
wrote:> Nella citazione giovedì 7 luglio 2011 19:41:16, John McCall ha scritto:
>> On Jul 7, 2011, at 4:33 AM, Carlo Alberto Ferraris wrote:
>>> Il 07/07/2011 11:14, Cameron Zwarich ha scritto:
>>>> I haven't read the code in detail, but it looks like
JumpThreading at least attempts to thread across indirect branches. You can
either try to fix it or file a bug with your test case.
>>> In the source it says "If the predecessor is an indirect goto,
we can't split the edge.
<http://llvm.org/docs/doxygen/html/JumpThreading_8cpp_source.html#l00914>"
Why would that be the case?
>> Splitting an edge creates a block which executes when leaving a
>> specific block to go to a specific successor. The only way to split an
>> indirect goto is to insert code before the jump which checks for a
>> specific destination address. This is very likely to be a
pessimization.
> Do you mean like turning a indirectbr %addr, [%a, %b, ...]
> into a switch %addr, undef, [blockaddr(%a), %a], [blo
> ckaddr(%b), %b], ...
> ? (I know that the switch doesn't accept a pointer as first argument,
it's just an example)
That difference is relevant, though, because there's no way to
implement such a switch except as a series of pointer comparisons
(or at least offset comparisons).  But I was approaching it from
the perspective of trying to split a single edge, rather than changing
the entire terminator;  there's no natural way to do that with an indirect
branch, so you have to do explicit pointer comparisons before the goto.
Doing even one or two of those could easily kill a lot of the benefit of
the indirect branch.

Basically, generic optimizations should not be in the business of
trying to split indirect branch edges, which is why the generic
splitting routines fail on them.
> I'll try to inspect the assembler. Just a quick thought in the mean
time, in the snippet I posted, if the backedge pointed directly to %19, other
optimizations would likely notice that the loop could be removed entirely and
replaced with a single addition. Do you think the code generator is able to do
this?
I don't think anyone's taught the loop optimizations to work
over indirect branches, but it's not impossible.  This specific
case would be better fixed by teaching some existing pass,
maybe JumpThreading, that an indirectbr on a phi with constant
blockaddr operands can be usefully optimized.

John.

Cameron Zwarich

2011-Jul-08 07:21 UTC

head link

[LLVMdev] Missed optimization with indirectbr terminator

On Jul 7, 2011, at 10:15 PM, Carlo Alberto Ferraris wrote:
> I'll try to inspect the assembler. Just a quick thought in the mean
time, in the snippet I posted, if the backedge pointed directly to %19, other
optimizations would likely notice that the loop could be removed entirely and
replaced with a single addition. Do you think the code generator is able t
> o do this?
Why would you write a loop with an indirect branch where the loop can be
deleted?

Cameron

Frits van Bommel

2011-Jul-08 08:09 UTC

head link

[LLVMdev] Missed optimization with indirectbr terminator

On 8 July 2011 09:05, John McCall <rjmccall at apple.com>
wrote:> On Jul 7, 2011, at 10:15 PM, Carlo Alberto Ferraris wrote:
>> I'll try to inspect the assembler. Just a quick thought in the mean
time, in the snippet I posted, if the backedge pointed directly to %19, other
optimizations would likely notice that the loop could be removed entirely and
replaced with a single addition. Do you think the code generator is able to do
this?
>
> I don't think anyone's taught the loop optimizations to work
> over indirect branches, but it's not impossible.  This specific
> case would be better fixed by teaching some existing pass,
> maybe JumpThreading, that an indirectbr on a phi with constant
> blockaddr operands can be usefully optimized.
Actually, jumpthreading already knows about indirectbr.

The problem is that JumpThreading refuses to thread across loop
headers (that is, the target blocks of backedges). It should have the
same problem with a phi of booleans and a conditional branch instead
of a phi of pointers and an indirectbr.

The comment above JumpThreading::FindLoopHeaders() explains the
current situation:

/// FindLoopHeaders - We do not want jump threading to turn proper loop
/// structures into irreducible loops.  Doing this breaks up the loop nesting
/// hierarchy and pessimizes later transformations.  To prevent this from
/// happening, we first have to find the loop headers.  Here we approximate this
/// by finding targets of backedges in the CFG.
///
/// Note that there definitely are cases when we want to allow threading of
/// edges across a loop header.  For example, threading a jump from outside the
/// loop (the preheader) to an exit block of the loop is definitely profitable.
/// It is also almost always profitable to thread backedges from within the loop
/// to exit blocks, and is often profitable to thread backedges to other blocks
/// within the loop (forming a nested loop).  This simple analysis is not rich
/// enough to track all of these properties and keep it up-to-date as the CFG
/// mutates, so we don't allow any of these transformations.

Note that the case discussed in this thread (a backedge to a block
within the loop) is mentioned as "often profitable", but that
JumpThreading simply doesn't know enough about loops to recognize it.

There are several ways to solve this, as I see it:
1) teach jumpthreading about loops,
2) teach some other pass that knows about loops how to thread backedges, or
3) introduce a whole new pass just for this purpose.

Carlo Alberto Ferraris

2011-Jul-08 14:38 UTC

head link

[LLVMdev] Missed optimization with indirectbr terminator

Il 08/07/2011 09:21, Cameron Zwarich ha scritto:> On Jul 7, 2011, at 10:15 PM, Carlo Alberto Ferraris wrote:
>
>> I'll try to inspect the assembler. Just a quick thought in the mean
time, in the snippet I posted, if the backedge pointed directly to %19, other
optimizations would likely notice that the loop could be removed entirely and
replaced with a single addition. Do you think the code generator is able t
>> o do this?
> Why would you write a loop with an indirect branch where the loop can be
deleted?I'm working on an IPO that uses indirectbrs. The loop was part of the 
code being compiled.

-- 
Carlo Alberto Ferraris <cafxx at strayorange.com 
<mailto:cafxx at strayorange.com>>
website/blog <http://cafxx.strayorange.com> - +39 333 7643 235
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110708/74b37c7c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cafxx.vcf
Type: text/x-vcard
Size: 233 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110708/74b37c7c/attachment.vcf>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Jul 2011 - [LLVMdev] Missed optimization with indirectbr terminator

[LLVMdev] Missed optimization with indirectbr terminator

[LLVMdev] Missed optimization with indirectbr terminator

[LLVMdev] Missed optimization with indirectbr terminator

[LLVMdev] Missed optimization with indirectbr terminator

[LLVMdev] Missed optimization with indirectbr terminator

Reasonably Related Threads