thr3ads.net - llvm dev - [llvm-dev] [RFC] Add IR level interprocedural outliner for code size. [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2017-Jul-26 16:36 UTC

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

2017-07-26 9:31 GMT-07:00 Quentin Colombet <qcolombet at apple.com>:
>
> On Jul 25, 2017, at 10:36 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
>
> 2017-07-24 16:14 GMT-07:00 Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org>:
>
>> Hi River,
>>
>> On Jul 24, 2017, at 2:36 PM, River Riddle <riddleriver at
gmail.com> wrote:
>>
>> Hi Quentin,
>>  I appreciate the feedback. When I reference the cost of Target Hooks
>> it's mainly for maintainability and cost on a target author. We
want to
>> keep the intrusion into target information minimized. The heuristics
used
>> for the outliner are the same used by any other IR level pass seeking
>> target information, i.e TTI for the most part. I can see where you are
>> coming from with "having heuristics solely focused on code size do
not
>> seem realistic", but I don't agree with that statement.
>>
>>
>> If you only want code size I agree it makes sense, but I believe, even
in
>> Oz, we probably don’t want to slow the code by a big factor for a
couple
>> bytes. That’s what I wanted to say and what I wanted to point out is
that
>> you need to have some kind of model for the performance to avoid those
>> worst cases. Unless we don’t care :).
>>
>
> That's why we have threshold though, don't we?
>
>
> When I see threshold, I think magic number and I don’t like it that.
>
Fair, but heuristic is the best we have when we don't want to optimize for
a single metric or we can't have a perfect modeling of the world.

>
> Also the IR makes it easy to connect to PGO, which allows to focus the
> outlining on "cold" regions and preserve good performance.
> River: did you consider this already? Having a good integration with PGO
> could make this part of the default optimization pipeline (i.e. having a
> mode where we outline only the knowingly "cold" code).
>
>
>
>
>
>
>
>>
>> I think there is a disconnect on heuristics. The only user tunable
>> parameters are the lower bound parameters(to the cost model), the
actual
>> analysis(heuristic calculation) is based upon TTI information.
>>
>>
>> I don’t see how you can get around adding more hooks to know how a
>> specific function prototype is going to be lowered (e.g., i64 needs to
be
>> split into two registers, fourth and onward parameters need to be
pushed on
>> the stack and so on). Those change the code size benefit.
>>
>
> How is the inliner doing? How are we handling Oz there?
> If we are fine living with approximation for the inliner, why wouldn't
the
> same work for an outliner?
>
>
> Unlike inlining, outlining does not expose optimization opportunities.
>
I would expect that getting the cold code out of the way would help with
locality / caching of the hot-path. I remember Amaury even working on
getting cold *basic block* in a different section without outlining them in
a function.

But I guess what you mean is that as long as we're focusing solely on
getting the smallest possible binary ever, you may be closer to "perfect
modeling" very late in the pipeline.


There are several comparison benchmarks given in the "More
detailed>> performance data" of the original RFC. It includes comparisons to
the
>> Machine Outliner when possible(I can't build clang on Linux with
Machine
>> Outliner). I welcome any and all discussion on the placement of the
>> outliner in LLVM.
>>
>>
>> My fear with a new framework is that we are going to split the effort
for
>> pushing the outliner technology forward and I’d like to avoid that if
at
>> all possible.
>>
>
> It isn't clear to me that implementing it at the MachineLevel was the
> right trade-off in the first place.
>
>
> Fair enough. it has the advantage of not rely on heuristic for its cost
> model though.
>
> I'm not sure a full comparative study was performed and discussed
upstream
> at the time where the MachineIR outliner was implemented? If so it
wouldn't
> be fair to ask this to River now.
>
>
> I am not asking that :).
>
OK great :)


-- 
Mehdi



2017-07-26 9:31 GMT-07:00 Quentin Colombet <qcolombet at apple.com>:
>
> On Jul 25, 2017, at 10:36 PM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
>
> 2017-07-24 16:14 GMT-07:00 Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org>:
>
>> Hi River,
>>
>> On Jul 24, 2017, at 2:36 PM, River Riddle <riddleriver at
gmail.com> wrote:
>>
>> Hi Quentin,
>>  I appreciate the feedback. When I reference the cost of Target Hooks
>> it's mainly for maintainability and cost on a target author. We
want to
>> keep the intrusion into target information minimized. The heuristics
used
>> for the outliner are the same used by any other IR level pass seeking
>> target information, i.e TTI for the most part. I can see where you are
>> coming from with "having heuristics solely focused on code size do
not
>> seem realistic", but I don't agree with that statement.
>>
>>
>> If you only want code size I agree it makes sense, but I believe, even
in
>> Oz, we probably don’t want to slow the code by a big factor for a
couple
>> bytes. That’s what I wanted to say and what I wanted to point out is
that
>> you need to have some kind of model for the performance to avoid those
>> worst cases. Unless we don’t care :).
>>
>
> That's why we have threshold though, don't we?
>
>
> When I see threshold, I think magic number and I don’t like it that.
>
> Also the IR makes it easy to connect to PGO, which allows to focus the
> outlining on "cold" regions and preserve good performance.
> River: did you consider this already? Having a good integration with PGO
> could make this part of the default optimization pipeline (i.e. having a
> mode where we outline only the knowingly "cold" code).
>
>
>
>
>
>
>
>>
>> I think there is a disconnect on heuristics. The only user tunable
>> parameters are the lower bound parameters(to the cost model), the
actual
>> analysis(heuristic calculation) is based upon TTI information.
>>
>>
>> I don’t see how you can get around adding more hooks to know how a
>> specific function prototype is going to be lowered (e.g., i64 needs to
be
>> split into two registers, fourth and onward parameters need to be
pushed on
>> the stack and so on). Those change the code size benefit.
>>
>
> How is the inliner doing? How are we handling Oz there?
> If we are fine living with approximation for the inliner, why wouldn't
the
> same work for an outliner?
>
>
> Unlike inlining, outlining does not expose optimization opportunities.
>
>
>
>
>>
>> There are several comparison benchmarks given in the "More
detailed
>> performance data" of the original RFC. It includes comparisons to
the
>> Machine Outliner when possible(I can't build clang on Linux with
Machine
>> Outliner). I welcome any and all discussion on the placement of the
>> outliner in LLVM.
>>
>>
>> My fear with a new framework is that we are going to split the effort
for
>> pushing the outliner technology forward and I’d like to avoid that if
at
>> all possible.
>>
>
> It isn't clear to me that implementing it at the MachineLevel was the
> right trade-off in the first place.
>
>
> Fair enough. it has the advantage of not rely on heuristic for its cost
> model though.
>
> I'm not sure a full comparative study was performed and discussed
upstream
> at the time where the MachineIR outliner was implemented? If so it
wouldn't
> be fair to ask this to River now.
>
>
> I am not asking that :).
>
>
> --
> Mehdi
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/bf782c4c/attachment.html>

Quentin Colombet via llvm-dev

2017-Jul-26 17:10 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

> On Jul 26, 2017, at 9:36 AM, Mehdi AMINI <joker.eph at gmail.com>
wrote:
> 
> 
> 
> 2017-07-26 9:31 GMT-07:00 Quentin Colombet <qcolombet at apple.com
<mailto:qcolombet at apple.com>>:
> 
>> On Jul 25, 2017, at 10:36 PM, Mehdi AMINI <joker.eph at gmail.com
<mailto:joker.eph at gmail.com>> wrote:
>> 
>> 
>> 
>> 2017-07-24 16:14 GMT-07:00 Quentin Colombet via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>:
>> Hi River,
>> 
>>> On Jul 24, 2017, at 2:36 PM, River Riddle <riddleriver at
gmail.com <mailto:riddleriver at gmail.com>> wrote:
>>> 
>>> Hi Quentin,
>>>  I appreciate the feedback. When I reference the cost of Target
Hooks it's mainly for maintainability and cost on a target author. We want
to keep the intrusion into target information minimized. The heuristics used for
the outliner are the same used by any other IR level pass seeking target
information, i.e TTI for the most part. I can see where you are coming from with
"having heuristics solely focused on code size do not seem realistic",
but I don't agree with that statement.
>> 
>> If you only want code size I agree it makes sense, but I believe, even
in Oz, we probably don’t want to slow the code by a big factor for a couple
bytes. That’s what I wanted to say and what I wanted to point out is that you
need to have some kind of model for the performance to avoid those worst cases.
Unless we don’t care :).
>> 
>> That's why we have threshold though, don't we? 
> 
> When I see threshold, I think magic number and I don’t like it that.
> 
> Fair, but heuristic is the best we have when we don't want to optimize
for a single metric or we can't have a perfect modeling of the world.
>  
> 
>> Also the IR makes it easy to connect to PGO, which allows to focus the
outlining on "cold" regions and preserve good performance.
>> River: did you consider this already? Having a good integration with
PGO could make this part of the default optimization pipeline (i.e. having a
mode where we outline only the knowingly "cold" code).
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>>> I think there is a disconnect on heuristics. The only user tunable
parameters are the lower bound parameters(to the cost model), the actual
analysis(heuristic calculation) is based upon TTI information.
>> 
>> I don’t see how you can get around adding more hooks to know how a
specific function prototype is going to be lowered (e.g., i64 needs to be split
into two registers, fourth and onward parameters need to be pushed on the stack
and so on). Those change the code size benefit.
>> 
>> How is the inliner doing? How are we handling Oz there?
>> If we are fine living with approximation for the inliner, why
wouldn't the same work for an outliner?
> 
> Unlike inlining, outlining does not expose optimization opportunities.
> 
> I would expect that getting the cold code out of the way would help with
locality / caching of the hot-path. I remember Amaury even working on getting
cold *basic block* in a different section without outlining them in a function.
> 
> But I guess what you mean is that as long as we're focusing solely on
getting the smallest possible binary ever, you may be closer to "perfect
modeling" very late in the pipeline.
No, I mean in terms of enabling other optimizations in the pipeline like
vectorizer. Outliner does not expose any of that.
> 
> 
>>> There are several comparison benchmarks given in the "More
detailed performance data" of the original RFC. It includes comparisons to
the Machine Outliner when possible(I can't build clang on Linux with Machine
Outliner). I welcome any and all discussion on the placement of the outliner in
LLVM.
>> 
>> My fear with a new framework is that we are going to split the effort
for pushing the outliner technology forward and I’d like to avoid that if at all
possible.
>> 
>> It isn't clear to me that implementing it at the MachineLevel was
the right trade-off in the first place.
> 
> Fair enough. it has the advantage of not rely on heuristic for its cost
model though.
> 
>> I'm not sure a full comparative study was performed and discussed
upstream at the time where the MachineIR outliner was implemented? If so it
wouldn't be fair to ask this to River now.
> 
> I am not asking that :).
> 
> OK great :)
> 
> 
> -- 
> Mehdi
>  
> 
> 
> 2017-07-26 9:31 GMT-07:00 Quentin Colombet <qcolombet at apple.com
<mailto:qcolombet at apple.com>>:
> 
>> On Jul 25, 2017, at 10:36 PM, Mehdi AMINI <joker.eph at gmail.com
<mailto:joker.eph at gmail.com>> wrote:
>> 
>> 
>> 
>> 2017-07-24 16:14 GMT-07:00 Quentin Colombet via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>:
>> Hi River,
>> 
>>> On Jul 24, 2017, at 2:36 PM, River Riddle <riddleriver at
gmail.com <mailto:riddleriver at gmail.com>> wrote:
>>> 
>>> Hi Quentin,
>>>  I appreciate the feedback. When I reference the cost of Target
Hooks it's mainly for maintainability and cost on a target author. We want
to keep the intrusion into target information minimized. The heuristics used for
the outliner are the same used by any other IR level pass seeking target
information, i.e TTI for the most part. I can see where you are coming from with
"having heuristics solely focused on code size do not seem realistic",
but I don't agree with that statement.
>> 
>> If you only want code size I agree it makes sense, but I believe, even
in Oz, we probably don’t want to slow the code by a big factor for a couple
bytes. That’s what I wanted to say and what I wanted to point out is that you
need to have some kind of model for the performance to avoid those worst cases.
Unless we don’t care :).
>> 
>> That's why we have threshold though, don't we? 
> 
> When I see threshold, I think magic number and I don’t like it that.
> 
>> Also the IR makes it easy to connect to PGO, which allows to focus the
outlining on "cold" regions and preserve good performance.
>> River: did you consider this already? Having a good integration with
PGO could make this part of the default optimization pipeline (i.e. having a
mode where we outline only the knowingly "cold" code).
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>>> I think there is a disconnect on heuristics. The only user tunable
parameters are the lower bound parameters(to the cost model), the actual
analysis(heuristic calculation) is based upon TTI information.
>> 
>> I don’t see how you can get around adding more hooks to know how a
specific function prototype is going to be lowered (e.g., i64 needs to be split
into two registers, fourth and onward parameters need to be pushed on the stack
and so on). Those change the code size benefit.
>> 
>> How is the inliner doing? How are we handling Oz there?
>> If we are fine living with approximation for the inliner, why
wouldn't the same work for an outliner?
> 
> Unlike inlining, outlining does not expose optimization opportunities.
> 
>> 
>>  
>> 
>>> There are several comparison benchmarks given in the "More
detailed performance data" of the original RFC. It includes comparisons to
the Machine Outliner when possible(I can't build clang on Linux with Machine
Outliner). I welcome any and all discussion on the placement of the outliner in
LLVM.
>> 
>> My fear with a new framework is that we are going to split the effort
for pushing the outliner technology forward and I’d like to avoid that if at all
possible.
>> 
>> It isn't clear to me that implementing it at the MachineLevel was
the right trade-off in the first place.
> 
> Fair enough. it has the advantage of not rely on heuristic for its cost
model though.
> 
>> I'm not sure a full comparative study was performed and discussed
upstream at the time where the MachineIR outliner was implemented? If so it
wouldn't be fair to ask this to River now.
> 
> I am not asking that :).
> 
>> 
>> -- 
>> Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170726/00ada501/attachment.html>

Sanjoy Das via llvm-dev

2017-Jul-26 19:07 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi,

On Wed, Jul 26, 2017 at 10:10 AM, Quentin Colombet via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> No, I mean in terms of enabling other optimizations in the pipeline like
> vectorizer. Outliner does not expose any of that.
I have not made a lot of effort to understand the full discussion here (so what
I say below may be off-base), but I think there are some cases where outlining
(especially working with function-attrs) can make optimization easier.

It can help transforms that duplicate code (like loop unrolling and inlining) be
more profitable -- I'm thinking of cases where unrolling/inlining would have
to
duplicate a lot of code, but after outlining would require duplicating only a
few call instructions.


It can help EarlyCSE do things that require GVN today:

void foo() {
  ... complex computation that computes func()
  ... complex computation that computes func()
}

outlining=>

int func() { ... }

void foo() {
  int x = func();
  int y = func();
}

functionattrs=>

int func() readonly { ... }

void foo(int a, int b) {
  int x = func();
  int y = func();
}

earlycse=>

int func(int t) readnone { ... }

void foo(int a, int b) {
  int x = func(a);
  int y = x;
}

GVN will catch this, but EarlyCSE is (at least supposed to be!) cheaper.


Once we have an analysis that can prove that certain functions can't trap,
outlining can allow LICM etc. to speculate entire outlined regions out of loops.


Generally, I think outlining exposes information that certain regions of the
program are doing identical things.  We should expect to get some mileage out of
this information.

-- Sanjoy

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jul 2017 - [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Maybe Matching Threads