thr3ads.net - llvm dev - [llvm-dev] [RFC] Add IR level interprocedural outliner for code size. [Jul 2017]

If this information is useful, please help other people find it:
Share via:

River Riddle via llvm-dev

2017-Jul-24 18:55 UTC

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Jessica,
 The comparison to the inliner is an interesting one but we think it's
important to note the difference in the use of heuristics. The inliner is
juggling many different tasks at the same time, execution speed, code size,
etc. which can cause the parameters to be very sensitive depending on the
benchmark/platform/etc. The outliners heuristics are focused solely on the
potential code size savings from outlining, and is thus only sensitive to
the current platform. This only creates a problem when we are over
estimating the potential cost of a set of instructions for a particular
target. The cost model parameters are only minimums: instruction sequence
length, estimated benefit, occurrence amount. The heuristics themselves are
conservative and based upon all of the target information available at the
IR level, the parameters are just setting a lower bound to weed out any
outliers. You are correct in that being at the machine level, before or
after RA, will give the most accurate heuristics but we feel there's an
advantage to being at the IR level. At the IR level we can do so many more
things that are either too difficult/complex for the machine level(e.g
parameterization/outputs/etc). Not only can we do these things but they are
available on all targets immediately, without the need for target hooks.
The caution on the use of heuristics is understandable, but there comes a
point when trade offs need to be made. We made the trade off for a loss in
exact cost modeling to gain flexibility, coverage, and potential for
further features. This trade off is the same made for quite a few IR level
optimizations, including inlining. As for the worry about code size
regressions, so far the results seem to support our hypothesis.
 Thanks,
River Riddle

On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com>
wrote:
>
> Hi River,
>
> I’m working on the MachineOutliner pass at the MIR level. Working at the
> IR level sounds interesting! It also seems like our algorithms are similar.
> I was thinking of taking the suffix array route with the MachineOutliner in
> the future.
>
> Anyway, I’d like to ask about this:
>
> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> The downside to having this type of transformation be at the IR level is
> it means there will be less accuracy in the cost model -  we can somewhat
> accurately model the cost per instruction but we can’t get information on
> how a window of instructions may lower. This can cause regressions
> depending on the platform/codebase, therefore to help alleviate this there
> are several tunable parameters for the cost model.
>
>
> The inliner is threshold-based and it can be rather unpredictable how it
> will impact the code size of a program. Do you have any idea as to how
> heuristics/thresholds/parameters could be tuned to prevent this? In my
> experience, making good code size decisions with these sorts of passes
> requires a lot of knowledge about what instructions you’re dealing with
> exactly. I’ve seen the inliner cause some pretty serious code size
> regressions in projects due to small changes to the cost model/parameters
> which cause improvements in other projects. I’m a little worried that an
> IR-level outliner for code size would be doomed to a similar fate.
>
> Perhaps it would be interesting to run this sort of pass pre-register
> allocation? This would help pull you away from having to use heuristics,
> but give you some more opportunities for finding repeated instruction
> sequences. I’ve thought of doing something like this in the future with the
> MachineOutliner and seeing how it goes.
>
> - Jessica
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/73a8be56/attachment.html>

Quentin Colombet via llvm-dev

2017-Jul-24 20:42 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi River,
> On Jul 24, 2017, at 11:55 AM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Jessica,
>  The comparison to the inliner is an interesting one but we think it's
important to note the difference in the use of heuristics. The inliner is
juggling many different tasks at the same time, execution speed, code size, etc.
which can cause the parameters to be very sensitive depending on the
benchmark/platform/etc. The outliners heuristics are focused solely on the
potential code size savings from outlining, and is thus only sensitive to the
current platform. This only creates a problem when we are over estimating the
potential cost of a set of instructions for a particular target. The cost model
parameters are only minimums: instruction sequence length, estimated benefit,
occurrence amount. The heuristics themselves are conservative and based upon all
of the target information available at the IR level, the parameters are just
setting a lower bound to weed out any outliers. You are correct in that being at
the machine level, before or after RA, will give the most accurate heuristics
but we feel there's an advantage to being at the IR level. At the IR level
we can do so many more things that are either too difficult/complex for the
machine level(e.g parameterization/outputs/etc). Not only can we do these things
but they are available on all targets immediately, without the need for target
hooks. The caution on the use of heuristics is understandable, but there comes a
point when trade offs need to be made. We made the trade off for a loss in exact
cost modeling to gain flexibility, coverage, and potential for further features.
This trade off is the same made for quite a few IR level optimizations,
including inlining. As for the worry about code size regressions, so far the
results seem to support our hypothesis.
Would still be interesting to see how well this could perform on some exact
model (i.e., at the Machine level), IMO. Target hooks are cheap and choosing an
implementation because it is simpler might not be the right long term solution.
At the very least, to know what trade-off we are making, having prototypes with
the different approaches sounds sensible.
In particular, all the heuristics about cost for parameter passing (haven’t
checked how you did it) sounds already complex enough and would require target
hooks. Therefore, I am not seeing a clear win with an IR approach here.

Finally, having heuristics solely focused on code size do not seem realistic.
Indeed, I am guessing you have some thresholds to avoid outlining some piece of
code too small that would end up adding a whole lot of indirections and I don’t
like magic numbers in general :).

To summarize, I wanted to point out that an IR approach is not as a clear win as
you describe and would thus deserve more discussion.

Cheers,
-Quentin
>  Thanks,
> River Riddle
> 
> On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com <mailto:jpaquette at apple.com>> wrote:
> 
> Hi River,
> 
> I’m working on the MachineOutliner pass at the MIR level. Working at the IR
level sounds interesting! It also seems like our algorithms are similar. I was
thinking of taking the suffix array route with the MachineOutliner in the
future.
> 
> Anyway, I’d like to ask about this:
> 
>> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> The downside to having this type of transformation be at the IR level
is it means there will be less accuracy in the cost model -  we can somewhat
accurately model the cost per instruction but we can’t get information on how a
window of instructions may lower. This can cause regressions depending on the
platform/codebase, therefore to help alleviate this there are several tunable
parameters for the cost model.
> 
> The inliner is threshold-based and it can be rather unpredictable how it
will impact the code size of a program. Do you have any idea as to how
heuristics/thresholds/parameters could be tuned to prevent this? In my
experience, making good code size decisions with these sorts of passes requires
a lot of knowledge about what instructions you’re dealing with exactly. I’ve
seen the inliner cause some pretty serious code size regressions in projects due
to small changes to the cost model/parameters which cause improvements in other
projects. I’m a little worried that an IR-level outliner for code size would be
doomed to a similar fate.
> 
> Perhaps it would be interesting to run this sort of pass pre-register
allocation? This would help pull you away from having to use heuristics, but
give you some more opportunities for finding repeated instruction sequences.
I’ve thought of doing something like this in the future with the MachineOutliner
and seeing how it goes.
> 
> - Jessica
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/348f7f3b/attachment.html>

Jessica Paquette via llvm-dev

2017-Jul-24 20:59 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi River,

I think outlining at the IR level certainly has a lot of potential. I also think
that working at a more generic representation such as IR opens lots of doors for
interesting code size improvements; however, I’m still iffy on heuristics in
general.
> The inliner is juggling many different tasks at the same time, execution
speed, code size, etc. which can cause the parameters to be very sensitive
depending on the benchmark/platform/etc.Strongly agree here; it’s unclear *what* a heuristic should look like for an
inliner.
> The outliners heuristics are focused solely on the potential code size
savings from outlining, and is thus only sensitive to the current platform.This is where I get a little concerned.

Firstly, as previously mentioned, we don’t know how an instruction will be
lowered. So, if we say that each IR instruction has a cost of “1 somethings”, we
haven’t really solved a *code size problem*, but more of a *code reduction
problem*. That is, we’ve performed a reduction to the overall structure of the
program, but we don’t know that that overall structural reduction will produce
an equivalent code size reduction. I personally would be inclined to figure out
how far off such a cost model could be, given a known architecture.

Secondly, suppose we have a cost model like in the inliner, where we have some
heuristic constants which are employed by some cost model. If we change the
model, then how do we know our heuristic constants are still good? Even if we
tune these constants for some specific target to try and make good decisions, if
we change the outliner *at all*, then we have to retune the heuristics. Changes
in later passes could also introduce the need for retuning.

Finally, the big issue here is that with code size problems, or at least with
outlining, we’re trying to solve something which is inherently tied to the
architecture we’re working with. A good outlining decision for x86 will look
different from a good outlining decision for ARM64. The issue isn’t purely
structural; the actual instructions we’re dealing with matter.

Once again, I think this stuff is really interesting, and I think your results
are looking great so far. It’s really awesome to see this method being
successful at a high level. I’m also making a lot of assumptions about what your
cost model might look like. I’ll probably have to look at the code to make a
sound judgement. :)

- Jessica
> On Jul 24, 2017, at 11:55 AM, River Riddle <riddleriver at gmail.com>
wrote:
> 
> Hi Jessica,
>  The comparison to the inliner is an interesting one but we think it's
important to note the difference in the use of heuristics. The inliner is
juggling many different tasks at the same time, execution speed, code size, etc.
which can cause the parameters to be very sensitive depending on the
benchmark/platform/etc. The outliners heuristics are focused solely on the
potential code size savings from outlining, and is thus only sensitive to the
current platform. This only creates a problem when we are over estimating the
potential cost of a set of instructions for a particular target. The cost model
parameters are only minimums: instruction sequence length, estimated benefit,
occurrence amount. The heuristics themselves are conservative and based upon all
of the target information available at the IR level, the parameters are just
setting a lower bound to weed out any outliers. You are correct in that being at
the machine level, before or after RA, will give the most accurate heuristics
but we feel there's an advantage to being at the IR level. At the IR level
we can do so many more things that are either too difficult/complex for the
machine level(e.g parameterization/outputs/etc). Not only can we do these things
but they are available on all targets immediately, without the need for target
hooks. The caution on the use of heuristics is understandable, but there comes a
point when trade offs need to be made. We made the trade off for a loss in exact
cost modeling to gain flexibility, coverage, and potential for further features.
This trade off is the same made for quite a few IR level optimizations,
including inlining. As for the worry about code size regressions, so far the
results seem to support our hypothesis.
>  Thanks,
> River Riddle
> 
> On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com <mailto:jpaquette at apple.com>> wrote:
> 
> Hi River,
> 
> I’m working on the MachineOutliner pass at the MIR level. Working at the IR
level sounds interesting! It also seems like our algorithms are similar. I was
thinking of taking the suffix array route with the MachineOutliner in the
future.
> 
> Anyway, I’d like to ask about this:
> 
>> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> The downside to having this type of transformation be at the IR level
is it means there will be less accuracy in the cost model -  we can somewhat
accurately model the cost per instruction but we can’t get information on how a
window of instructions may lower. This can cause regressions depending on the
platform/codebase, therefore to help alleviate this there are several tunable
parameters for the cost model.
> 
> The inliner is threshold-based and it can be rather unpredictable how it
will impact the code size of a program. Do you have any idea as to how
heuristics/thresholds/parameters could be tuned to prevent this? In my
experience, making good code size decisions with these sorts of passes requires
a lot of knowledge about what instructions you’re dealing with exactly. I’ve
seen the inliner cause some pretty serious code size regressions in projects due
to small changes to the cost model/parameters which cause improvements in other
projects. I’m a little worried that an IR-level outliner for code size would be
doomed to a similar fate.
> 
> Perhaps it would be interesting to run this sort of pass pre-register
allocation? This would help pull you away from having to use heuristics, but
give you some more opportunities for finding repeated instruction sequences.
I’ve thought of doing something like this in the future with the MachineOutliner
and seeing how it goes.
> 
> - Jessica
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/73e33d0d/attachment-0001.html>

Meador Inge via llvm-dev

2017-Jul-24 21:16 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

> Firstly, as previously mentioned, we don’t know how an instruction will be
lowered. So, if we say that each IR instruction has a cost of “1 somethings”, we
haven’t really solved a *code size problem*, but more of a *code reduction
problem*. That is, we’ve performed a reduction to the overall structure of the
program, but we don’t know that that overall structural reduction will produce
an equivalent code size reduction. I personally would be inclined to figure out
how far off such a cost model could be, given a known architecture.
“code size” vs. “code reduction” seems like a really nice way to think about
this.  I don’t know a lot about the areas in discussion, but one thing that
occurred to me while reading this was: a code reduction couldn’t produce a worse
code size than without the reduction, right?

— Meador

River Riddle via llvm-dev

2017-Jul-24 21:36 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Quentin,
 I appreciate the feedback. When I reference the cost of Target Hooks it's
mainly for maintainability and cost on a target author. We want to keep the
intrusion into target information minimized. The heuristics used for the
outliner are the same used by any other IR level pass seeking target
information, i.e TTI for the most part. I can see where you are coming from
with "having heuristics solely focused on code size do not seem
realistic",
but I don't agree with that statement. I think there is a disconnect on
heuristics. The only user tunable parameters are the lower bound
parameters(to the cost model), the actual analysis(heuristic calculation)
is based upon TTI information.
 When you say "Would still be interesting to see how well this could
perform on some exact model (i.e., at the Machine level), IMO." I am
slightly confused as to what you mean. I do not intend to try and implement
this algorithm at the MIR level given that it exists in Machine Outliner.
There are several comparison benchmarks given in the "More detailed
performance data" of the original RFC. It includes comparisons to the
Machine Outliner when possible(I can't build clang on Linux with Machine
Outliner). I welcome any and all discussion on the placement of the
outliner in LLVM.
 Thanks,
River Riddle

On Mon, Jul 24, 2017 at 1:42 PM, Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi River,
>
> On Jul 24, 2017, at 11:55 AM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Jessica,
>  The comparison to the inliner is an interesting one but we think it's
> important to note the difference in the use of heuristics. The inliner is
> juggling many different tasks at the same time, execution speed, code size,
> etc. which can cause the parameters to be very sensitive depending on the
> benchmark/platform/etc. The outliners heuristics are focused solely on the
> potential code size savings from outlining, and is thus only sensitive to
> the current platform. This only creates a problem when we are over
> estimating the potential cost of a set of instructions for a particular
> target. The cost model parameters are only minimums: instruction sequence
> length, estimated benefit, occurrence amount. The heuristics themselves are
> conservative and based upon all of the target information available at the
> IR level, the parameters are just setting a lower bound to weed out any
> outliers. You are correct in that being at the machine level, before or
> after RA, will give the most accurate heuristics but we feel there's an
> advantage to being at the IR level. At the IR level we can do so many more
> things that are either too difficult/complex for the machine level(e.g
> parameterization/outputs/etc). Not only can we do these things but they are
> available on all targets immediately, without the need for target hooks.
> The caution on the use of heuristics is understandable, but there comes a
> point when trade offs need to be made. We made the trade off for a loss in
> exact cost modeling to gain flexibility, coverage, and potential for
> further features. This trade off is the same made for quite a few IR level
> optimizations, including inlining. As for the worry about code size
> regressions, so far the results seem to support our hypothesis.
>
>
> Would still be interesting to see how well this could perform on some
> exact model (i.e., at the Machine level), IMO. Target hooks are cheap and
> choosing an implementation because it is simpler might not be the right
> long term solution.
> At the very least, to know what trade-off we are making, having prototypes
> with the different approaches sounds sensible.
> In particular, all the heuristics about cost for parameter passing
> (haven’t checked how you did it) sounds already complex enough and would
> require target hooks. Therefore, I am not seeing a clear win with an IR
> approach here.
>
> Finally, having heuristics solely focused on code size do not seem
> realistic. Indeed, I am guessing you have some thresholds to avoid
> outlining some piece of code too small that would end up adding a whole lot
> of indirections and I don’t like magic numbers in general :).
>
> To summarize, I wanted to point out that an IR approach is not as a clear
> win as you describe and would thus deserve more discussion.
>
> Cheers,
> -Quentin
>
>  Thanks,
> River Riddle
>
> On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com>
> wrote:
>
>>
>> Hi River,
>>
>> I’m working on the MachineOutliner pass at the MIR level. Working at
the
>> IR level sounds interesting! It also seems like our algorithms are
similar.
>> I was thinking of taking the suffix array route with the
MachineOutliner in
>> the future.
>>
>> Anyway, I’d like to ask about this:
>>
>> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> The downside to having this type of transformation be at the IR level
is
>> it means there will be less accuracy in the cost model -  we can
somewhat
>> accurately model the cost per instruction but we can’t get information
on
>> how a window of instructions may lower. This can cause regressions
>> depending on the platform/codebase, therefore to help alleviate this
there
>> are several tunable parameters for the cost model.
>>
>>
>> The inliner is threshold-based and it can be rather unpredictable how
it
>> will impact the code size of a program. Do you have any idea as to how
>> heuristics/thresholds/parameters could be tuned to prevent this? In my
>> experience, making good code size decisions with these sorts of passes
>> requires a lot of knowledge about what instructions you’re dealing with
>> exactly. I’ve seen the inliner cause some pretty serious code size
>> regressions in projects due to small changes to the cost
model/parameters
>> which cause improvements in other projects. I’m a little worried that
an
>> IR-level outliner for code size would be doomed to a similar fate.
>>
>> Perhaps it would be interesting to run this sort of pass pre-register
>> allocation? This would help pull you away from having to use
heuristics,
>> but give you some more opportunities for finding repeated instruction
>> sequences. I’ve thought of doing something like this in the future with
the
>> MachineOutliner and seeing how it goes.
>>
>> - Jessica
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/693b903d/attachment.html>

River Riddle via llvm-dev

2017-Jul-24 21:45 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Jessica,
 When we model the cost of an instruction it is largely based upon an
estimation by the TTI. It is still an estimation but the results prove to
be somewhat close to the exact costs. If you are interested in the results
for some known platforms check the "More detailed results" section of
the
original RFC. It provides comparisons between the two configurations of
outlining at the IR level as well as the Machine Outliner. It includes
results for X86_64(Linux and Mac) as well as AArch64(The only two platforms
that the Machine Outliner supports). The results tend to be quite favorable
in regards to the cost model.
 As for the tunable heuristic(constants) numbers, these are very similar to
those that the Machine Outliner might have if it was tunable: Minimum
Benefit, Min Occurrences, Min Instruction Length. Those are the only
constants that really exist within the cost model.
 I appreciate all of the feedback and discussion!
Thanks,
River Riddle

On Mon, Jul 24, 2017 at 1:59 PM, Jessica Paquette <jpaquette at apple.com>
wrote:
> Hi River,
>
> I think outlining at the IR level certainly has a lot of potential. I also
> think that working at a more generic representation such as IR opens lots
> of doors for interesting code size improvements; however, I’m still iffy on
> heuristics in general.
>
> The inliner is juggling many different tasks at the same time, execution
> speed, code size, etc. which can cause the parameters to be very sensitive
> depending on the benchmark/platform/etc.
>
> Strongly agree here; it’s unclear *what* a heuristic should look like for
> an inliner.
>
> The outliners heuristics are focused solely on the potential code size
> savings from outlining, and is thus only sensitive to the current platform.
>
> This is where I get a little concerned.
>
> Firstly, as previously mentioned, we don’t know how an instruction will be
> lowered. So, if we say that each IR instruction has a cost of “1
> somethings”, we haven’t really solved a *code size problem*, but more of a
> *code reduction problem*. That is, we’ve performed a reduction to the
> overall structure of the program, but we don’t know that that overall
> structural reduction will produce an equivalent code size reduction. I
> personally would be inclined to figure out how far off such a cost model
> could be, given a known architecture.
>
> Secondly, suppose we have a cost model like in the inliner, where we have
> some heuristic constants which are employed by some cost model. If we
> change the model, then how do we know our heuristic constants are still
> good? Even if we tune these constants for some specific target to try and
> make good decisions, if we change the outliner *at all*, then we have to
> retune the heuristics. Changes in later passes could also introduce the
> need for retuning.
>
> Finally, the big issue here is that with code size problems, or at least
> with outlining, we’re trying to solve something which is inherently tied to
> the architecture we’re working with. A good outlining decision for x86 will
> look different from a good outlining decision for ARM64. The issue isn’t
> purely structural; the actual instructions we’re dealing with matter.
>
> Once again, I think this stuff is really interesting, and I think your
> results are looking great so far. It’s really awesome to see this method
> being successful at a high level. I’m also making a lot of assumptions
> about what your cost model might look like. I’ll probably have to look at
> the code to make a sound judgement. :)
>
> - Jessica
>
> On Jul 24, 2017, at 11:55 AM, River Riddle <riddleriver at gmail.com>
wrote:
>
> Hi Jessica,
>  The comparison to the inliner is an interesting one but we think it's
> important to note the difference in the use of heuristics. The inliner is
> juggling many different tasks at the same time, execution speed, code size,
> etc. which can cause the parameters to be very sensitive depending on the
> benchmark/platform/etc. The outliners heuristics are focused solely on the
> potential code size savings from outlining, and is thus only sensitive to
> the current platform. This only creates a problem when we are over
> estimating the potential cost of a set of instructions for a particular
> target. The cost model parameters are only minimums: instruction sequence
> length, estimated benefit, occurrence amount. The heuristics themselves are
> conservative and based upon all of the target information available at the
> IR level, the parameters are just setting a lower bound to weed out any
> outliers. You are correct in that being at the machine level, before or
> after RA, will give the most accurate heuristics but we feel there's an
> advantage to being at the IR level. At the IR level we can do so many more
> things that are either too difficult/complex for the machine level(e.g
> parameterization/outputs/etc). Not only can we do these things but they are
> available on all targets immediately, without the need for target hooks.
> The caution on the use of heuristics is understandable, but there comes a
> point when trade offs need to be made. We made the trade off for a loss in
> exact cost modeling to gain flexibility, coverage, and potential for
> further features. This trade off is the same made for quite a few IR level
> optimizations, including inlining. As for the worry about code size
> regressions, so far the results seem to support our hypothesis.
>  Thanks,
> River Riddle
>
> On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com>
> wrote:
>
>>
>> Hi River,
>>
>> I’m working on the MachineOutliner pass at the MIR level. Working at
the
>> IR level sounds interesting! It also seems like our algorithms are
similar.
>> I was thinking of taking the suffix array route with the
MachineOutliner in
>> the future.
>>
>> Anyway, I’d like to ask about this:
>>
>> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> The downside to having this type of transformation be at the IR level
is
>> it means there will be less accuracy in the cost model -  we can
somewhat
>> accurately model the cost per instruction but we can’t get information
on
>> how a window of instructions may lower. This can cause regressions
>> depending on the platform/codebase, therefore to help alleviate this
there
>> are several tunable parameters for the cost model.
>>
>>
>> The inliner is threshold-based and it can be rather unpredictable how
it
>> will impact the code size of a program. Do you have any idea as to how
>> heuristics/thresholds/parameters could be tuned to prevent this? In my
>> experience, making good code size decisions with these sorts of passes
>> requires a lot of knowledge about what instructions you’re dealing with
>> exactly. I’ve seen the inliner cause some pretty serious code size
>> regressions in projects due to small changes to the cost
model/parameters
>> which cause improvements in other projects. I’m a little worried that
an
>> IR-level outliner for code size would be doomed to a similar fate.
>>
>> Perhaps it would be interesting to run this sort of pass pre-register
>> allocation? This would help pull you away from having to use
heuristics,
>> but give you some more opportunities for finding repeated instruction
>> sequences. I’ve thought of doing something like this in the future with
the
>> MachineOutliner and seeing how it goes.
>>
>> - Jessica
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/136b081d/attachment.html>

Jessica Paquette via llvm-dev

2017-Jul-24 22:28 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

(Cc llvm-dev, whoops!)

Intuitively, no, but suppose you have a sequence of instructions that look like
this (in some weird pseudocode I just made up)

a
b
c
x
a
b
c

Now, this could be reduced to something that looks like this by an outliner:

call my_function
x
call my_function

my_function:
a
b
c
return

Structurally-speaking, this *is* a reduction.

Now, suppose we’re using an architecture like AArch64 where we care about
something like a link register. Then to be correct we may have to actually emit
something like this:

save special_register
call my_function
restore special_register
x
save special_register
call my_function
restore special_register

my_function:
a
b
c
return

In this case, we’ve actually produced more instructions than before. Although
we’ve simplified the structure of the code, we’ve still produced more of it. So,
if a high-level outliner isn’t aware of this sort of thing, you could actually
end up making binaries *larger* by removing similarity from the code.

- Jessica> 
>> On Jul 24, 2017, at 2:16 PM, Meador Inge <meadori at gmail.com>
wrote:
>> 
>>> Firstly, as previously mentioned, we don’t know how an instruction
will be lowered. So, if we say that each IR instruction has a cost of “1
somethings”, we haven’t really solved a *code size problem*, but more of a *code
reduction problem*. That is, we’ve performed a reduction to the overall
structure of the program, but we don’t know that that overall structural
reduction will produce an equivalent code size reduction. I personally would be
inclined to figure out how far off such a cost model could be, given a known
architecture.
>> 
>> “code size” vs. “code reduction” seems like a really nice way to think
about this.  I don’t know a lot about the areas in discussion, but one thing
that occurred to me while reading this was: a code reduction couldn’t produce a
worse code size than without the reduction, right?
>> 
>> — Meador
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170724/86393b65/attachment.html>

Andrey Bokhanko via llvm-dev

2017-Jul-24 23:09 UTC

head link

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Hi Quentin,

I really don't have a dog in this race, but... we can theoritize on
strengths and weaknesses of different approaches ad infinitum -- or just
accept that "the proof is in the pudding", or so they say.

And the proof is here! -- look at the numbers.

If anything, I believe it makes sense to at least accept Riddle's phase
to the trunk and let two approaches evolve in parallel. It probably silly
to have both enabled at the same time, so one that currently provides a
greater benefit should work on -Os/-Oz. With time, evolution will decide
which one should survive -- as it always does.

Yours,
Andrey
==Compiler Architect
NXP

On Monday, July 24, 2017, Quentin Colombet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi River,
>
> On Jul 24, 2017, at 11:55 AM, River Riddle via llvm-dev <
> llvm-dev at lists.llvm.org
> <javascript:_e(%7B%7D,'cvml','llvm-dev at
lists.llvm.org');>> wrote:
>
> Hi Jessica,
>  The comparison to the inliner is an interesting one but we think it's
> important to note the difference in the use of heuristics. The inliner is
> juggling many different tasks at the same time, execution speed, code size,
> etc. which can cause the parameters to be very sensitive depending on the
> benchmark/platform/etc. The outliners heuristics are focused solely on the
> potential code size savings from outlining, and is thus only sensitive to
> the current platform. This only creates a problem when we are over
> estimating the potential cost of a set of instructions for a particular
> target. The cost model parameters are only minimums: instruction sequence
> length, estimated benefit, occurrence amount. The heuristics themselves are
> conservative and based upon all of the target information available at the
> IR level, the parameters are just setting a lower bound to weed out any
> outliers. You are correct in that being at the machine level, before or
> after RA, will give the most accurate heuristics but we feel there's an
> advantage to being at the IR level. At the IR level we can do so many more
> things that are either too difficult/complex for the machine level(e.g
> parameterization/outputs/etc). Not only can we do these things but they are
> available on all targets immediately, without the need for target hooks.
> The caution on the use of heuristics is understandable, but there comes a
> point when trade offs need to be made. We made the trade off for a loss in
> exact cost modeling to gain flexibility, coverage, and potential for
> further features. This trade off is the same made for quite a few IR level
> optimizations, including inlining. As for the worry about code size
> regressions, so far the results seem to support our hypothesis.
>
>
> Would still be interesting to see how well this could perform on some
> exact model (i.e., at the Machine level), IMO. Target hooks are cheap and
> choosing an implementation because it is simpler might not be the right
> long term solution.
> At the very least, to know what trade-off we are making, having prototypes
> with the different approaches sounds sensible.
> In particular, all the heuristics about cost for parameter passing
> (haven’t checked how you did it) sounds already complex enough and would
> require target hooks. Therefore, I am not seeing a clear win with an IR
> approach here.
>
> Finally, having heuristics solely focused on code size do not seem
> realistic. Indeed, I am guessing you have some thresholds to avoid
> outlining some piece of code too small that would end up adding a whole lot
> of indirections and I don’t like magic numbers in general :).
>
> To summarize, I wanted to point out that an IR approach is not as a clear
> win as you describe and would thus deserve more discussion.
>
> Cheers,
> -Quentin
>
>  Thanks,
> River Riddle
>
> On Mon, Jul 24, 2017 at 11:12 AM, Jessica Paquette <jpaquette at
apple.com
> <javascript:_e(%7B%7D,'cvml','jpaquette at
apple.com');>> wrote:
>
>>
>> Hi River,
>>
>> I’m working on the MachineOutliner pass at the MIR level. Working at
the
>> IR level sounds interesting! It also seems like our algorithms are
similar.
>> I was thinking of taking the suffix array route with the
MachineOutliner in
>> the future.
>>
>> Anyway, I’d like to ask about this:
>>
>> On Jul 20, 2017, at 3:47 PM, River Riddle via llvm-dev <
>> llvm-dev at lists.llvm.org
>> <javascript:_e(%7B%7D,'cvml','llvm-dev at
lists.llvm.org');>> wrote:
>>
>> The downside to having this type of transformation be at the IR level
is
>> it means there will be less accuracy in the cost model -  we can
somewhat
>> accurately model the cost per instruction but we can’t get information
on
>> how a window of instructions may lower. This can cause regressions
>> depending on the platform/codebase, therefore to help alleviate this
there
>> are several tunable parameters for the cost model.
>>
>>
>> The inliner is threshold-based and it can be rather unpredictable how
it
>> will impact the code size of a program. Do you have any idea as to how
>> heuristics/thresholds/parameters could be tuned to prevent this? In my
>> experience, making good code size decisions with these sorts of passes
>> requires a lot of knowledge about what instructions you’re dealing with
>> exactly. I’ve seen the inliner cause some pretty serious code size
>> regressions in projects due to small changes to the cost
model/parameters
>> which cause improvements in other projects. I’m a little worried that
an
>> IR-level outliner for code size would be doomed to a similar fate.
>>
>> Perhaps it would be interesting to run this sort of pass pre-register
>> allocation? This would help pull you away from having to use
heuristics,
>> but give you some more opportunities for finding repeated instruction
>> sequences. I’ve thought of doing something like this in the future with
the
>> MachineOutliner and seeing how it goes.
>>
>> - Jessica
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> <javascript:_e(%7B%7D,'cvml','llvm-dev at
lists.llvm.org');>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170725/d565a5e2/attachment-0001.html>

llvm dev - Jul 2017 - [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.