thr3ads.net - llvm dev - [LLVMdev] LLVM Inliner [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Xinliang David Li

2010-Nov-24 01:07 UTC

[LLVMdev] LLVM Inliner

Hi, I browsed the LLVM inliner implementation, and it seems there is room
for improvement.  (I have not read it too carefully, so correct me if what I
observed is wrong).

First the good side of the inliner -- the function level summary and inline
cost estimation is more elaborate and complete than gcc. For instance, it
considers callsite arguments and the effects of optimization enabled by
inlining.

Now more to the weakness of the inliner:

1) It is bottom up.  The inlining is not done in the order based on the
priority of the callsites.  It may leave important callsites (at top of the
cg) unlined due to higher cost after inline cost update. It also eliminates
the possibility of inline specialization. To change this, the inliner pass
may not use the pass manager infrastructure .  (I noticed a hack in the
inliner to workaround the problem -- for static functions avoid inlining its
callees if it causes it to become too big ..)

2) There seems to be only one inliner pass.  For calls to small functions,
it is better to perform early inlining as one of the local (per function)
optimizations followed by scalar opt clean up. This will sharpen the summary
information.  (Note the inline summary update does not consider the possible
cleanup)

3)  recursive inlining is not supported

4) function with indirect branch is not inlined. What source construct does
indirect branch instr correspond to ? variable jump?

5) fudge factor prefers functions with vector instructions -- why is that?

6) There is one heuristc used in inline-cost computation seems wrong:


  // Calls usually take a long time, so they make the inlining gain smaller.
  InlineCost += CalleeFI->Metrics.NumCalls * InlineConstants::CallPenalty;

Does it try to block inlining of callees with lots of calls? Note inlining
such a function only increase static call counts.


Thanks,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101123/07e0f81b/attachment.html>

Reid Kleckner

2010-Nov-24 03:40 UTC

head link

[LLVMdev] LLVM Inliner

On Tue, Nov 23, 2010 at 8:07 PM, Xinliang David Li <xinliangli at
gmail.com> wrote:> Hi, I browsed the LLVM inliner implementation, and it seems there is room
> for improvement.  (I have not read it too carefully, so correct me if what
I
> observed is wrong).
> First the good side of the inliner -- the function level summary and inline
> cost estimation is more elaborate and complete than gcc. For instance, it
> considers callsite arguments and the effects of optimization enabled by
> inlining.
> Now more to the weakness of the inliner:
> 1) It is bottom up.  The inlining is not done in the order based on the
> priority of the callsites.  It may leave important callsites (at top of the
> cg) unlined due to higher cost after inline cost update. It also eliminates
> the possibility of inline specialization. To change this, the inliner pass
> may not use the pass manager infrastructure .  (I noticed a hack in the
> inliner to workaround the problem -- for static functions avoid inlining
its
> callees if it causes it to become too big ..)
> 2) There seems to be only one inliner pass.  For calls to small functions,
> it is better to perform early inlining as one of the local (per function)
> optimizations followed by scalar opt clean up. This will sharpen the
summary
> information.  (Note the inline summary update does not consider the
possible
> cleanup)
> 3)  recursive inlining is not supported
> 4) function with indirect branch is not inlined. What source construct does
> indirect branch instr correspond to ? variable jump?
This corresponds to functions that use GCC's labels-as-values
extension, not things like switch statements or virtual calls, so this
really only applies to things like interpreter loops, which you don't
usually want to inline.
> 5) fudge factor prefers functions with vector instructions -- why is that?
I'm guessing that vectorized instructions are an indicator that the
function is a hotspot, and should therefore be inlined.
> 6) There is one heuristc used in inline-cost computation seems wrong:
>
>   // Calls usually take a long time, so they make the inlining gain
smaller.
>   InlineCost += CalleeFI->Metrics.NumCalls *
InlineConstants::CallPenalty;
> Does it try to block inlining of callees with lots of calls? Note inlining
> such a function only increase static call counts.
I'm guessing the heuristic is that functions with lots of calls are
heavy and spend a lot of time in their callees, so eliminating the
overhead of one call isn't worth it.

Reid

Xinliang David Li

2010-Nov-24 04:29 UTC

head link

[LLVMdev] LLVM Inliner

On Tue, Nov 23, 2010 at 7:40 PM, Reid Kleckner <reid.kleckner at
gmail.com>wrote:
> On Tue, Nov 23, 2010 at 8:07 PM, Xinliang David Li <xinliangli at
gmail.com>
> wrote:
> > Hi, I browsed the LLVM inliner implementation, and it seems there is
room
> > for improvement.  (I have not read it too carefully, so correct me if
> what I
> > observed is wrong).
> > First the good side of the inliner -- the function level summary and
> inline
> > cost estimation is more elaborate and complete than gcc. For instance,
it
> > considers callsite arguments and the effects of optimization enabled
by
> > inlining.
> > Now more to the weakness of the inliner:
> > 1) It is bottom up.  The inlining is not done in the order based on
the
> > priority of the callsites.  It may leave important callsites (at top
of
> the
> > cg) unlined due to higher cost after inline cost update. It also
> eliminates
> > the possibility of inline specialization. To change this, the inliner
> pass
> > may not use the pass manager infrastructure .  (I noticed a hack in
the
> > inliner to workaround the problem -- for static functions avoid
inlining
> its
> > callees if it causes it to become too big ..)
> > 2) There seems to be only one inliner pass.  For calls to small
> functions,
> > it is better to perform early inlining as one of the local (per
function)
> > optimizations followed by scalar opt clean up. This will sharpen the
> summary
> > information.  (Note the inline summary update does not consider the
> possible
> > cleanup)
> > 3)  recursive inlining is not supported
> > 4) function with indirect branch is not inlined. What source construct
> does
> > indirect branch instr correspond to ? variable jump?
>
> This corresponds to functions that use GCC's labels-as-values
> extension, not things like switch statements or virtual calls, so this
> really only applies to things like interpreter loops, which you don't
> usually want to inline.
>
> > 5) fudge factor prefers functions with vector instructions -- why is
> that?
>
> I'm guessing that vectorized instructions are an indicator that the
> function is a hotspot, and should therefore be inlined.
>
This is a reasonable heuristic.

>
> > 6) There is one heuristc used in inline-cost computation seems wrong:
> >
> >   // Calls usually take a long time, so they make the inlining gain
> smaller.
> >   InlineCost += CalleeFI->Metrics.NumCalls *
> InlineConstants::CallPenalty;
> > Does it try to block inlining of callees with lots of calls? Note
> inlining
> > such a function only increase static call counts.
>
> I'm guessing the heuristic is that functions with lots of calls are
> heavy and spend a lot of time in their callees, so eliminating the
> overhead of one call isn't worth it.
>
Those calls may sit in a cold paths -- penalizing inlining to those
functions can miss opportunities.

David

>
> Reid
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101123/3dc59a33/attachment.html>

Nick Lewycky

2010-Nov-24 20:37 UTC

head link

[LLVMdev] LLVM Inliner

Xinliang David Li wrote:> Hi, I browsed the LLVM inliner implementation, and it seems there is
> room for improvement.  (I have not read it too carefully, so correct me
> if what I observed is wrong).
>
> First the good side of the inliner -- the function level summary and
> inline cost estimation is more elaborate and complete than gcc. For
> instance, it considers callsite arguments and the effects of
> optimization enabled by inlining.
>
> Now more to the weakness of the inliner:
>
> 1) It is bottom up.  The inlining is not done in the order based on the
> priority of the callsites.  It may leave important callsites (at top of
> the cg) unlined due to higher cost after inline cost update. It also
> eliminates the possibility of inline specialization. To change this, the
> inliner pass may not use the pass manager infrastructure .  (I noticed a
> hack in the inliner to workaround the problem -- for static functions
> avoid inlining its callees if it causes it to become too big ..)
>
> 2) There seems to be only one inliner pass.  For calls to small
> functions, it is better to perform early inlining as one of the local
> (per function) optimizations followed by scalar opt clean up. This will
> sharpen the summary information.  (Note the inline summary update does
> not consider the possible cleanup)
Hi David,

You're correct that there's only one inliner pass, but I don't 
understand how an early inliner would help. In LLVM's case, after we 
inline one function into another, we run the entire stack of 
per-function optimizations on it before deciding whether to inline it 
into its caller. This sharpens the summary information before the 
inliner looks at the next SCC, as we go bottom up. Is there something 
more that early inlining would do for us that isn't achieved with this 
model?

(Actually, there is another inliner pass, but it's for 
__attribute__((always_inline)) so it's rather uninteresting.)
> 3)  recursive inlining is not supported
>
> 4) function with indirect branch is not inlined. What source construct
> does indirect branch instr correspond to ? variable jump?
Right. It's for "goto *ptr;".

Nick
> 5) fudge factor prefers functions with vector instructions -- why is that?
>
> 6) There is one heuristc used in inline-cost computation seems wrong:
>
>
>    // Calls usually take a long time, so they make the inlining gain
> smaller.
>    InlineCost += CalleeFI->Metrics.NumCalls *
InlineConstants::CallPenalty;
>
> Does it try to block inlining of callees with lots of calls? Note
> inlining such a function only increase static call counts.
>
>
> Thanks,
>
> David
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Xinliang David Li

2010-Nov-24 22:17 UTC

head link

[LLVMdev] LLVM Inliner

On Wed, Nov 24, 2010 at 12:37 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
> Xinliang David Li wrote:
>
>> Hi, I browsed the LLVM inliner implementation, and it seems there is
>> room for improvement.  (I have not read it too carefully, so correct me
>> if what I observed is wrong).
>>
>> First the good side of the inliner -- the function level summary and
>> inline cost estimation is more elaborate and complete than gcc. For
>> instance, it considers callsite arguments and the effects of
>> optimization enabled by inlining.
>>
>> Now more to the weakness of the inliner:
>>
>> 1) It is bottom up.  The inlining is not done in the order based on the
>> priority of the callsites.  It may leave important callsites (at top of
>> the cg) unlined due to higher cost after inline cost update. It also
>> eliminates the possibility of inline specialization. To change this,
the
>> inliner pass may not use the pass manager infrastructure .  (I noticed
a
>> hack in the inliner to workaround the problem -- for static functions
>> avoid inlining its callees if it causes it to become too big ..)
>>
>> 2) There seems to be only one inliner pass.  For calls to small
>> functions, it is better to perform early inlining as one of the local
>> (per function) optimizations followed by scalar opt clean up. This will
>> sharpen the summary information.  (Note the inline summary update does
>> not consider the possible cleanup)
>>
>
> Hi David,
>
> You're correct that there's only one inliner pass, but I don't
understand
> how an early inliner would help. In LLVM's case, after we inline one
> function into another, we run the entire stack of per-function
optimizations
> on it before deciding whether to inline it into its caller. This sharpens
> the summary information before the inliner looks at the next SCC, as we go
> bottom up. Is there something more that early inlining would do for us that
> isn't achieved with this model?
>
Interesting -- LLVM does perform on the fly cleanups during inlining
transformation -- this will make summary update precise. One thing I notice
from the debug pass dump is that the 'deduce function attribute' pass
happens before the clean up -- Is it intended?

Thanks,

David

>
> (Actually, there is another inliner pass, but it's for
> __attribute__((always_inline)) so it's rather uninteresting.)
>
>
>  3)  recursive inlining is not supported
>>
>> 4) function with indirect branch is not inlined. What source construct
>> does indirect branch instr correspond to ? variable jump?
>>
>
> Right. It's for "goto *ptr;".
>
> Nick
>
>  5) fudge factor prefers functions with vector instructions -- why is that?
>>
>> 6) There is one heuristc used in inline-cost computation seems wrong:
>>
>>
>>   // Calls usually take a long time, so they make the inlining gain
>> smaller.
>>   InlineCost += CalleeFI->Metrics.NumCalls *
InlineConstants::CallPenalty;
>>
>> Does it try to block inlining of callees with lots of calls? Note
>> inlining such a function only increase static call counts.
>>
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101124/f3b5cfab/attachment.html>

Chris Lattner

2010-Nov-28 22:37 UTC

head link

[LLVMdev] LLVM Inliner

On Nov 23, 2010, at 5:07 PM, Xinliang David Li wrote:> Hi, I browsed the LLVM inliner implementation, and it seems there is room
for improvement.  (I have not read it too carefully, so correct me if what I
observed is wrong).
> 
> First the good side of the inliner -- the function level summary and inline
cost estimation is more elaborate and complete than gcc. For instance, it
considers callsite arguments and the effects of optimization enabled by
inlining.
Yep, as others pointed out, this is intended to interact closely with the
per-function optimizations that get mixed in due to the inliner being a
callgraphscc pass.  This is actually a really important property of the inliner.
If you have a function foo that calls a leaf function bar, the sequence of
optimization is:

1. Run the inliner on bar (noop, since it has no call sites)
2. Run the per-function passes on bar.  This generally shrinks it, and prevents
"abstraction penalty" from making bar look too big to inline.
3. Run the inliner on foo.  Since foo calls bar, we consider inlining bar into
foo and do so if profitable.
4. Run the per-function passes on foo.  If bar got inlined, this means that
we're running the per-function passes over the inlined contents of bar
again.

In a traditional optimizer like GCC's, you end up with problems where you
have to set a high inline threshold due to inlining-before-optimizing causing
"abstraction penalty problems".  An early inliner is a hack that tries
to address this.  Another problem with this approach from the compile time
perspective is that you end up repeating work multiple times.  For example, if
there is a common subexpression in a small function, you end up inlining it into
many places, then having to eliminate the common subexpression in each copy.

The LLVM inliner avoids these problems, but (as you point out) this really does
force it to being a bottom-up inliner.  This means that the bottom-up inliner
needs to make decisions in strange ways in some cases: for example if qux called
foo, and foo were static, then (when processing foo) we may decide not to inline
bar into foo because it would be more profitable to inline foo into qux.
> Now more to the weakness of the inliner:
> 
> 1) It is bottom up.  The inlining is not done in the order based on the
priority of the callsites.  It may leave important callsites (at top of the cg)
unlined due to higher cost after inline cost update. It also eliminates the
possibility of inline specialization. To change this, the inliner pass may not
use the pass manager infrastructure .  (I noticed a hack in the inliner to
workaround the problem -- for static functions avoid inlining its callees if it
causes it to become too big ..)
This is true, but I don't think it's a really large problem in practice.
We don't have a "global inline threshold limit" (which I've
never understood, except as a hack to prevent run-away inlining) so not visiting
in priority order shouldn't prevent high-priority-but-processed-late
candidates from being inlined.

The only potential issue I'm aware of is if we have A->B->C and we
decide to inline C into B when it would be more profitable to inline B into A
and leave C out of line.  This can be handled with a heuristic like the one
above.
> 2) There seems to be only one inliner pass.  For calls to small functions,
it is better to perform early inlining as one of the local (per function)
optimizations followed by scalar opt clean up. This will sharpen the summary
information.  (Note the inline summary update does not consider the possible
cleanup)
Yep.  This is a feature :)
> 3)  recursive inlining is not supported
This is a policy decision.  It's not clear whether it is really a good idea,
though I have seen some bugzilla or something about it.  I agree that it should
be revisited.
> 4) function with indirect branch is not inlined. What source construct does
indirect branch instr correspond to ? variable jump?
See:
http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html

for more details.
> 6) There is one heuristc used in inline-cost computation seems wrong:
> 
>   // Calls usually take a long time, so they make the inlining gain
smaller.
>   InlineCost += CalleeFI->Metrics.NumCalls *
InlineConstants::CallPenalty;
> 
> Does it try to block inlining of callees with lots of calls? Note inlining
such a function only increase static call counts.
I think that this is a heuristic that Jakob came up with, but I think it's a
good one, also discussed elsewhere on the thread.

When talking about inlining and tuning thresholds and heuristics, it is a good
idea to quantify what the expected or possible wins of inlining a function are. 
Some of the ones I'm aware of:

1. In some cases, inlining shrinks code.

2. Inlining a function exposes optimization opportunities on the inlined code,
because constant propagation and other simplifications can take place.

3. Inlining a function exposes optimizations in the caller because address-taken
values can be promoted to registers.

4. Inlining a function can improve optimization in a caller because
interprocedural side-effect analysis isn't needed.  For example, load/call
dependence may not be precise.  This is something we should continue to improve
in the optimizer though.

5. Inlining code with indirect call sites and switches can improve branch
prediction if some callers of the function are biased differently than other
callers.  This is pretty hard to predict without profile info though.

The "punish functions containing lots of calls" is based on the
assumption that functions which are mostly calls (again, this decision happens
after the callee has been inlined and simplified) aren't themselves doing
much work.

-Chris

Xinliang David Li

2010-Nov-29 07:39 UTC

head link

[LLVMdev] LLVM Inliner

On Sun, Nov 28, 2010 at 2:37 PM, Chris Lattner <clattner at apple.com>
wrote:
> On Nov 23, 2010, at 5:07 PM, Xinliang David Li wrote:
> > Hi, I browsed the LLVM inliner implementation, and it seems there is
room
> for improvement.  (I have not read it too carefully, so correct me if what
I
> observed is wrong).
> >
> > First the good side of the inliner -- the function level summary and
> inline cost estimation is more elaborate and complete than gcc. For
> instance, it considers callsite arguments and the effects of optimization
> enabled by inlining.
>
> Yep, as others pointed out, this is intended to interact closely with the
> per-function optimizations that get mixed in due to the inliner being a
> callgraphscc pass.  This is actually a really important property of the
> inliner.  If you have a function foo that calls a leaf function bar, the
> sequence of optimization is:
>
> 1. Run the inliner on bar (noop, since it has no call sites)
> 2. Run the per-function passes on bar.  This generally shrinks it, and
> prevents "abstraction penalty" from making bar look too big to
inline.
> 3. Run the inliner on foo.  Since foo calls bar, we consider inlining bar
> into foo and do so if profitable.
> 4. Run the per-function passes on foo.  If bar got inlined, this means that
> we're running the per-function passes over the inlined contents of bar
> again.
>
On-the-fly clean up (optimization) while doing bottom up inlining is nice as
you described. Many other compilers chose not to do this way due to
scalability concerns (with IPO) -- this can make the IPO the biggest bottom
neck in terms of compile time (as it is serialized).  Memory many not be a
big issue for LLVM as I can see the good locality in pass manager. (Just
curious, what is biggest application LLVM can build with IPO?)

>
> In a traditional optimizer like GCC's, you end up with problems where
you
> have to set a high inline threshold due to inlining-before-optimizing
> causing "abstraction penalty problems".  An early inliner is a
hack that
> tries to address this.

It is a hack in some sense (but a common practice) -- but enables other
flexibilities.

>  Another problem with this approach from the compile time perspective is
> that you end up repeating work multiple times.  For example, if there is a
> common subexpression in a small function, you end up inlining it into many
> places, then having to eliminate the common subexpression in each copy.
>
Early inlining + scalar opt can do the same, right?

>
> The LLVM inliner avoids these problems, but (as you point out) this really
> does force it to being a bottom-up inliner.  This means that the bottom-up
> inliner needs to make decisions in strange ways in some cases: for example
> if qux called foo, and foo were static, then (when processing foo) we may
> decide not to inline bar into foo because it would be more profitable to
> inline foo into qux.
>
> > Now more to the weakness of the inliner:
> >
> > 1) It is bottom up.  The inlining is not done in the order based on
the
> priority of the callsites.  It may leave important callsites (at top of the
> cg) unlined due to higher cost after inline cost update. It also eliminates
> the possibility of inline specialization. To change this, the inliner pass
> may not use the pass manager infrastructure .  (I noticed a hack in the
> inliner to workaround the problem -- for static functions avoid inlining
its
> callees if it causes it to become too big ..)
>
> This is true, but I don't think it's a really large problem in
practice.
>  We don't have a "global inline threshold limit" (which
I've never
> understood, except as a hack to prevent run-away inlining) so not visiting
> in priority order shouldn't prevent high-priority-but-processed-late
> candidates from being inlined.
>
global threshold can be used to control the unnecessary size growth. In some
cases, the size increase may also cause increase in icache footprint leading
to poor performance. In fact, with IPO/CMO, icache footprint can be modeled
in some way and be used as one kind of global limit.

>
> The only potential issue I'm aware of is if we have A->B->C and
we decide
> to inline C into B when it would be more profitable to inline B into A and
> leave C out of line.  This can be handled with a heuristic like the one
> above.
>
> > 2) There seems to be only one inliner pass.  For calls to small
> functions, it is better to perform early inlining as one of the local (per
> function) optimizations followed by scalar opt clean up. This will sharpen
> the summary information.  (Note the inline summary update does not consider
> the possible cleanup)
>
> Yep.  This is a feature :)
>
> > 3)  recursive inlining is not supported
>
> This is a policy decision.  It's not clear whether it is really a good
> idea, though I have seen some bugzilla or something about it.  I agree that
> it should be revisited.
>
> > 4) function with indirect branch is not inlined. What source construct
> does indirect branch instr correspond to ? variable jump?
>
> See:
> http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
>
> for more details.
>
> > 6) There is one heuristc used in inline-cost computation seems wrong:
> >
> >   // Calls usually take a long time, so they make the inlining gain
> smaller.
> >   InlineCost += CalleeFI->Metrics.NumCalls *
> InlineConstants::CallPenalty;
> >
> > Does it try to block inlining of callees with lots of calls? Note
> inlining such a function only increase static call counts.
>
> I think that this is a heuristic that Jakob came up with, but I think
it's
> a good one, also discussed elsewhere on the thread.
>
> When talking about inlining and tuning thresholds and heuristics, it is a
> good idea to quantify what the expected or possible wins of inlining a
> function are.  Some of the ones I'm aware of:
>
> 1. In some cases, inlining shrinks code.
>
> 2. Inlining a function exposes optimization opportunities on the inlined
> code, because constant propagation and other simplifications can take
place.
>
> 3. Inlining a function exposes optimizations in the caller because
> address-taken values can be promoted to registers.
>
> 4. Inlining a function can improve optimization in a caller because
> interprocedural side-effect analysis isn't needed.  For example,
load/call
> dependence may not be precise.  This is something we should continue to
> improve in the optimizer though.
>
> 5. Inlining code with indirect call sites and switches can improve branch
> prediction if some callers of the function are biased differently than
other
> callers.  This is pretty hard to predict without profile info though.
>
>Besides -- 1) reducing call overhead; 2) scheduling freedom; 3) enabling
optimizations across inline instances of callee(s); 4) sharpening local
analysis (mainly aliasing) results -- such as points to, malloc etc.

It may also lose aliasing assertion (such as restrict aliasing) if not done
properly.

>
> The "punish functions containing lots of calls" is based on the
assumption
> that functions which are mostly calls (again, this decision happens after
> the callee has been inlined and simplified) aren't themselves doing
much
> work.
>
My point is that using static count of callsites as a indicator for this can
be misleading. All the calls may be calls to cold external functions for
instance.

Thanks,

David


>
> -Chris
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101128/0dff7413/attachment.html>

Nuno Lopes

2010-Nov-29 23:05 UTC

head link

[LLVMdev] LLVM Inliner

BTW, why not learn the heuristic automatically?..  It's actually not very
difficult. Last semester, for a class project, I did some experiments with
using statistical classifier boosting to automatically derive new heuristic
functions for LLVM's inliner (for -O2 and -Os).
The results are a bit biased since I've used a very small training program 
set (just SPEC 2006). I just made the report available on-line in case 
someone is interested: 
http://web.ist.utl.pt/nuno.lopes/pubs/inline-stats10.pdf

Nuno

----- Original Message -----> On Nov 23, 2010, at 5:07 PM, Xinliang David Li wrote:
>> Hi, I browsed the LLVM inliner implementation, and it seems there is
room
>> for improvement.  (I have not read it too carefully, so correct me if 
>> what I observed is wrong).
>>
>> First the good side of the inliner -- the function level summary and 
>> inline cost estimation is more elaborate and complete than gcc. For 
>> instance, it considers callsite arguments and the effects of
optimization
>> enabled by inlining.
>
> Yep, as others pointed out, this is intended to interact closely with the 
> per-function optimizations that get mixed in due to the inliner being a 
> callgraphscc pass.  This is actually a really important property of the 
> inliner.  If you have a function foo that calls a leaf function bar, the 
> sequence of optimization is:
>
> 1. Run the inliner on bar (noop, since it has no call sites)
> 2. Run the per-function passes on bar.  This generally shrinks it, and 
> prevents "abstraction penalty" from making bar look too big to
inline.
> 3. Run the inliner on foo.  Since foo calls bar, we consider inlining bar 
> into foo and do so if profitable.
> 4. Run the per-function passes on foo.  If bar got inlined, this means 
> that we're running the per-function passes over the inlined contents of
> bar again.
>
> In a traditional optimizer like GCC's, you end up with problems where
you
> have to set a high inline threshold due to inlining-before-optimizing 
> causing "abstraction penalty problems".  An early inliner is a
hack that
> tries to address this.  Another problem with this approach from the 
> compile time perspective is that you end up repeating work multiple times. 
> For example, if there is a common subexpression in a small function, you 
> end up inlining it into many places, then having to eliminate the common 
> subexpression in each copy.
>
> The LLVM inliner avoids these problems, but (as you point out) this really 
> does force it to being a bottom-up inliner.  This means that the bottom-up 
> inliner needs to make decisions in strange ways in some cases: for example 
> if qux called foo, and foo were static, then (when processing foo) we may 
> decide not to inline bar into foo because it would be more profitable to 
> inline foo into qux.
>
>> Now more to the weakness of the inliner:
>>
>> 1) It is bottom up.  The inlining is not done in the order based on the
>> priority of the callsites.  It may leave important callsites (at top of
>> the cg) unlined due to higher cost after inline cost update. It also 
>> eliminates the possibility of inline specialization. To change this,
the
>> inliner pass may not use the pass manager infrastructure .  (I noticed
a
>> hack in the inliner to workaround the problem -- for static functions 
>> avoid inlining its callees if it causes it to become too big ..)
>
> This is true, but I don't think it's a really large problem in
practice.
> We don't have a "global inline threshold limit" (which
I've never
> understood, except as a hack to prevent run-away inlining) so not visiting 
> in priority order shouldn't prevent high-priority-but-processed-late 
> candidates from being inlined.
>
> The only potential issue I'm aware of is if we have A->B->C and
we decide
> to inline C into B when it would be more profitable to inline B into A and 
> leave C out of line.  This can be handled with a heuristic like the one 
> above.
>
>> 2) There seems to be only one inliner pass.  For calls to small 
>> functions, it is better to perform early inlining as one of the local 
>> (per function) optimizations followed by scalar opt clean up. This will
>> sharpen the summary information.  (Note the inline summary update does 
>> not consider the possible cleanup)
>
> Yep.  This is a feature :)
>
>> 3)  recursive inlining is not supported
>
> This is a policy decision.  It's not clear whether it is really a good 
> idea, though I have seen some bugzilla or something about it.  I agree 
> that it should be revisited.
>
>> 4) function with indirect branch is not inlined. What source construct 
>> does indirect branch instr correspond to ? variable jump?
>
> See:
> http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
>
> for more details.
>
>> 6) There is one heuristc used in inline-cost computation seems wrong:
>>
>>   // Calls usually take a long time, so they make the inlining gain 
>> smaller.
>>   InlineCost += CalleeFI->Metrics.NumCalls * 
>> InlineConstants::CallPenalty;
>>
>> Does it try to block inlining of callees with lots of calls? Note 
>> inlining such a function only increase static call counts.
>
> I think that this is a heuristic that Jakob came up with, but I think
it's
> a good one, also discussed elsewhere on the thread.
>
> When talking about inlining and tuning thresholds and heuristics, it is a 
> good idea to quantify what the expected or possible wins of inlining a 
> function are.  Some of the ones I'm aware of:
>
> 1. In some cases, inlining shrinks code.
>
> 2. Inlining a function exposes optimization opportunities on the inlined 
> code, because constant propagation and other simplifications can take 
> place.
>
> 3. Inlining a function exposes optimizations in the caller because 
> address-taken values can be promoted to registers.
>
> 4. Inlining a function can improve optimization in a caller because 
> interprocedural side-effect analysis isn't needed.  For example,
load/call
> dependence may not be precise.  This is something we should continue to 
> improve in the optimizer though.
>
> 5. Inlining code with indirect call sites and switches can improve branch 
> prediction if some callers of the function are biased differently than 
> other callers.  This is pretty hard to predict without profile info 
> though.
>
>
> The "punish functions containing lots of calls" is based on the
assumption
> that functions which are mostly calls (again, this decision happens after 
> the callee has been inlined and simplified) aren't themselves doing
much
> work.
>
> -Chris

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2010 - [LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

[LLVMdev] LLVM Inliner

Possibly Parallel Threads