thr3ads.net - llvm dev - [llvm-dev] [RFC] Target-specific parametrization of function inliner [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Artem Belevich via llvm-dev

2016-Mar-02 00:31 UTC

[llvm-dev] [RFC] Target-specific parametrization of function inliner

Hi,

I propose to make function inliner parameters adjustable for specific
target.

Currently function inlining pass appears to be target-agnostic with various
constants for calculating call cost hardcoded. While it works reasonably
well for general purpose CPUs, some quirkier targets like NVPTX would
benefit from target-specific tuning.

Currently it appears that there are two things that need to be done:

* add Inliner preferences to TargetTransformInfo in a way similar to how we
customize loop unrolling. Use it to provide inliner with target-specific
thresholds and other parameters.
* augment Inliner pass to use existing TargetTransformInfo API to figure
out cost of particular call on a given target. TargetTransforInfo already
has getCallCost(), though it does not look like anything uses it.

Comments? Concerns? Suggestions?

Thanks,
-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160301/e371ce52/attachment.html>

Hal Finkel via llvm-dev

2016-Mar-10 14:42 UTC

head link

[llvm-dev] [RFC] Target-specific parametrization of function inliner

----- Original Message -----> From: "Artem Belevich via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Tuesday, March 1, 2016 6:31:06 PM
> Subject: [llvm-dev] [RFC] Target-specific parametrization of function
inliner
> 
> Hi,
> 
> 
> I propose to make function inliner parameters adjustable for specific
> target.
> 
> Currently function inlining pass appears to be target-agnostic with
> various constants for calculating call cost hardcoded. While it
> works reasonably well for general purpose CPUs, some quirkier
> targets like NVPTX would benefit from target-specific tuning.
> 
> 
> Currently it appears that there are two things that need to be done:
> 
> 
> * add Inliner preferences to TargetTransformInfo in a way similar to
> how we customize loop unrolling. Use it to provide inliner with
> target-specific thresholds and other parameters.
> * augment Inliner pass to use existing TargetTransformInfo API to
> figure out cost of particular call on a given target.
> TargetTransforInfo already has getCallCost(), though it does not
> look like anything uses it.
> 
> 
> Comments? Concerns? Suggestions?
> 
Hi Art,

I've long thought that we should have a more principled way of doing inline
profitability. There is obviously some cost to executing a function body, some
call site overhead, and some cost reduction associated with any post-inlining
simplifications. If inlining reduces the overall call site cost by more than
some factor, say 1% (this should probably depend on the optimization level),
then we should inline. With profiling information, we might even use global
speedup instead of local speedup.

Whether we need a target customization of this threshold, or just a way for a
target to supplement the fine inlining decision, is unclear to me. It is also
true that a the result of a bunch of locally-optimal decisions might be far from
the global optimum. Maybe the target has something to say about that?

In short, I'm fine with what you're proposing, but to the extent
possible, I want the numbers provided by the target to mean something. Replacing
a global set of somewhat-arbitrary magic numbers, with target-specific sets of
somewhat-arbitrary magic numbers should be our last choice.

Thanks again,
Hal

> 
> Thanks,
> --
> 
> 
> --Artem Belevich
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth via llvm-dev

2016-Mar-10 14:49 UTC

head link

[llvm-dev] [RFC] Target-specific parametrization of function inliner

IMO, the appropriate thing for TTI to inform the inliner about is how
costly the actual act of a "call" is likely to be. I would hope that
this
would only be used on targets where there is some really dramatic overhead
of actually doing a function call such that the code size cost incurred by
inlining is completely dwarfed by the improvements. GPUs are one of the few
platforms that exhibit this kind of behavior, although I don't think
they're truly unique, just a common example.

This isn't quite the same thing as the cost of the call instruction, which
has much more to do with the size. Instead, it has to do with the expected
consequences of actually leaving a call edge in the program.

To me, this pretty accurately reflects the TTI hook we have for customizing
loop unrolling where the cost of having a cyclic CFG is modeled to help
indicate that on some targets (also GPUs) it is worth a very large amount
of code size growth to simplify the control flow in a particular way.

Does that make sense to you Hal? Based on that, it would really just be a
scaling factor of the inline heuristics. Unsure of how to more
scientifically express this construct.

-Chandler

On Thu, Mar 10, 2016 at 3:42 PM Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> ----- Original Message -----
> > From: "Artem Belevich via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Tuesday, March 1, 2016 6:31:06 PM
> > Subject: [llvm-dev] [RFC] Target-specific parametrization of function
> inliner
> >
> > Hi,
> >
> >
> > I propose to make function inliner parameters adjustable for specific
> > target.
> >
> > Currently function inlining pass appears to be target-agnostic with
> > various constants for calculating call cost hardcoded. While it
> > works reasonably well for general purpose CPUs, some quirkier
> > targets like NVPTX would benefit from target-specific tuning.
> >
> >
> > Currently it appears that there are two things that need to be done:
> >
> >
> > * add Inliner preferences to TargetTransformInfo in a way similar to
> > how we customize loop unrolling. Use it to provide inliner with
> > target-specific thresholds and other parameters.
> > * augment Inliner pass to use existing TargetTransformInfo API to
> > figure out cost of particular call on a given target.
> > TargetTransforInfo already has getCallCost(), though it does not
> > look like anything uses it.
> >
> >
> > Comments? Concerns? Suggestions?
> >
>
> Hi Art,
>
> I've long thought that we should have a more principled way of doing
> inline profitability. There is obviously some cost to executing a function
> body, some call site overhead, and some cost reduction associated with any
> post-inlining simplifications. If inlining reduces the overall call site
> cost by more than some factor, say 1% (this should probably depend on the
> optimization level), then we should inline. With profiling information, we
> might even use global speedup instead of local speedup.
>
> Whether we need a target customization of this threshold, or just a way
> for a target to supplement the fine inlining decision, is unclear to me. It
> is also true that a the result of a bunch of locally-optimal decisions
> might be far from the global optimum. Maybe the target has something to say
> about that?
>
> In short, I'm fine with what you're proposing, but to the extent
possible,
> I want the numbers provided by the target to mean something. Replacing a
> global set of somewhat-arbitrary magic numbers, with target-specific sets
> of somewhat-arbitrary magic numbers should be our last choice.
>
> Thanks again,
> Hal
>
>
> >
> > Thanks,
> > --
> >
> >
> > --Artem Belevich
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/d43a99d0/attachment.html>

Xinliang David Li via llvm-dev

2016-Mar-10 17:00 UTC

head link

[llvm-dev] [RFC] Target-specific parametrization of function inliner

IMO, a good inliner with a precise cost/benefit model will eventually need
what Art is proposing here.

Giving the function call overhead as an example. It depends on a couple of
factors: 1) call/return instruction latency; 2) function epilogue/prologue;
3) calling convention (argument parsing, using registers or not, what
register classes etc).  All these factors depend on target information.  If
we want go deeper, we know certain micro architectures uses a stack of
call/return pairs to help branch prediction of ret instructions -- such
stack has a target specific limit which can be triggered when a callsite is
deep in the callchain.   Register file size and register pressure increase
due to inline comes as another example.

Another relevant example is the icache/itlb sizes. To do a more precise
analysis of the cost to 'speed' due to icache/itlb pressure increase
requires target information, profile information as well as some global
analysis. Easwaran has done some research in this area in the past and can
share the analysis design when other things are ready.

>
> Hi Art,
>
> I've long thought that we should have a more principled way of doing
> inline profitability. There is obviously some cost to executing a function
> body, some call site overhead, and some cost reduction associated with any
> post-inlining simplifications. If inlining reduces the overall call site
> cost by more than some factor, say 1% (this should probably depend on the
> optimization level), then we should inline. With profiling information, we
> might even use global speedup instead of local speedup.
>
yes -- with target specific cost information, global speedup analysis can
be more precise :)

>
> Whether we need a target customization of this threshold, or just a way
> for a target to supplement the fine inlining decision, is unclear to me. It
> is also true that a the result of a bunch of locally-optimal decisions
> might be far from the global optimum. Maybe the target has something to say
> about that?
>

The concept of threshold can be a topic of another discussion.  In current
design, I think the threshold should remain target independent.  It is the
cost that is target specific.

thanks,

David


>
> In short, I'm fine with what you're proposing, but to the extent
possible,
> I want the numbers provided by the target to mean something. Replacing a
> global set of somewhat-arbitrary magic numbers, with target-specific sets
> of somewhat-arbitrary magic numbers should be our last choice.
>
> Thanks again,
> Hal
>
>
> >
> > Thanks,
> > --
> >
> >
> > --Artem Belevich
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/587221d1/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Mar 2016 - [RFC] Target-specific parametrization of function inliner

[llvm-dev] [RFC] Target-specific parametrization of function inliner

[llvm-dev] [RFC] Target-specific parametrization of function inliner

[llvm-dev] [RFC] Target-specific parametrization of function inliner

[llvm-dev] [RFC] Target-specific parametrization of function inliner

Maybe Matching Threads