thr3ads.net - llvm dev - [llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size. [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Jessica Paquette via llvm-dev

2017-Sep-27 16:28 UTC

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

I think that, given previous discussion on the topic, we might want a split like
this:

(1) Search structure

Suffix tree or suffix array.

(2) Numbering/mapping/congruence scheme

Every outliner should implement a function that maps instructions (or whatever
structure you want to outline, crazy thoughts…) to integers. That should be
passed to the search structure, which will return a list of repeated sequences.

The MachineOutliner currently has an “InstructionMapper” struct in it. I think
we can probably build on that struct so that we can choose different mapping
schemes.

(3) Cost model/candidate selection scheme

Every outliner should implement a function that returns the outlining cost or
benefit of a candidate. In the MachineOutliner, this is a joint effort between
the pass and the target.

(4) Outlining scheme

Every outliner should implement a function that actually performs outlining.
That is, we should have a function that replaces each instance of a repeated
sequence of instructions with a call to a newly-created function. In the
MachineOutliner, the method by which we do this is decided on by the target.

So, we at the very least might want an interface that implements something like

unsigned mapInstructions(…);
unsigned getOutliningCost(…);
unsigned findCandidates(…);
bool outline(…);

The MachineOutliner is *almost* set up like this. So, I think it should be
pretty easy to build the split this way without a lot of adaptation. The only
place I can see it being somewhat tricky is maybe with all of the
target-specified information in the MachineOutliner. I think this is pretty
similar to what River has already suggested, though.

I think it would probably be easier to move forward by pulling stuff out of the
MachineOutliner instead of replacing all of that code with new stuff. However,
that being said, if there’s an improvement to both passes by changing the
infrastructure, I’m not opposed to experimenting with changes!

It would probably be easier to try to mostly pull stuff from the MachineOutliner
though, since it’s already been in tree for a while and has been tested quite a
bit.

- Jessica
> On Sep 26, 2017, at 5:24 PM, Gerolf Hoflehner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> 
>> On Sep 21, 2017, at 8:02 PM, River Riddle <riddleriver at gmail.com
<mailto:riddleriver at gmail.com>> wrote:
>> 
>> Hey Gerolf,
>> 
>> On Thu, Sep 21, 2017 at 7:10 PM, Gerolf Hoflehner <ghoflehner at
apple.com <mailto:ghoflehner at apple.com>> wrote:
>> In general I would love to see an outliner at the IR level also. But
rather than a comparison vs. the machine outliner I would like to learn more
about how the core data structures between the outliners will be shared.
>>  
>> The only structure that needs to be shared is a struct defining what an
outlining candidate looks like.
>>  
>> In particular for matching/pruning it seems to be a reasonable
approach.
>>  
>> When I get around to posting the real patches there are two utility
functions that allow for finding candidates and pruning a candidate list. The
only thing necessary to use these utilities is a vector containing the unsigned
value number of the instructions. All of the utilities are IR agnostic.
>>  
>> A few more remarks/questions are below also.
>> 
>> Thanks
>> Gerolf
>> 
>> 
>>> On Sep 5, 2017, at 4:16 PM, River Riddle via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> Hey Everybody,
>>>  A little while ago I posted an
RFC(http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html
<http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html>) with the
proposition of adding a new outliner at the IR level. There was some confusion
and many questions regarding the proposal which I’d like to address here:
>>>  
>>> Note about nomenclature:
>>>     Candidate: A repeated sequence of instructions within a module.
>>>     Occurrence: One instance of a candidate sequence.
>>> 
>>> -- Accompanied Graph Data -- 
>>> 
>>> Graph data is referenced in the sections below, any reference to
Graph[*Number*] is referencing the numbered graph in the following document:
>>> 
>>> https://goo.gl/QDiVHU <https://goo.gl/QDiVHU>
>>> ---- Performance ----
>>>  
>>> I have tested the IR outliner and current Machine outliner on a
wide variety of benchmarks. The results include total % reduction of both
geomean and total size. It also includes individual results for each test in
each respective benchmark.
>>>  
>>>  The configurations tested are:
>>> ·       Early+Late IR outlining
>>> ·       Late IR outlining
>>> ·       Machine outlining
>>> ·       Early+Late+Machine outlining
>>> ·       Late+Machine outlining
>>> 
>>> NOTE: For fairness in comparisons with the Machine Outliner, all IR
outliner runs also include (-mno-red-zone, outlining from linkonce_odr/weak_odr
functions).
>>>  
>>>  The code size benchmarking results provided are:
>>> * LLVM Test Suite
>>> X86_64, X86*, AArch64, Arm1176jzf-s*, Arm1176jzf-s-thumb*
>>> * Spec 2006
>>> X86_64, X86*, AArch64, Arm1176jzf-s*, Arm1176jzf-s-thumb*
>>> * Clang
>>> X86_64(Mac OS) 
>>> * llvm-tblgen
>>> X86_64(Mac OS)      	
>>> * CSiBE
>>> AArch64
>>> * The machine outliner currently only supports X86_64 and AArch64.
>>>  
>>> Full Code Size Results:
>>> https://goo.gl/ZBjHCG <https://goo.gl/ZBjHCG>
>>>  
>>> --- Algorithmic differences with the Machine Outliner ----
>>> 
>>>           	There was a lot of confusion on how exactly the
algorithm I am proposing differs from what is available in the Machine Outliner.
The similarities of the two outliners lie in the usage of a string matching
algorithm and candidate pruning. The first step in the algorithm is to basically
do the same common substring / pruning algorithm the post-RA MO uses but with a
specially chosen congruence relation. I’d like to delve into the differences
between the two:
>>>  
>>> Congruence Detection:
>>>  
>>> -        Machine Outliner
>>>    The machine outliner has the advantage of having this problem
already taken care of by the register allocator, it simply checks to see if the
two machine instructions are identical.
>>>  
>>> -        IR Outliner
>>>    In the IR outliner we work on semantic equivalence, i.e. we care
the operations being performed are equivalent and not the values. This creates a
need to add verification that we do have exact equivalence when we need it, e.g.
ShuffleVector’s shuffle mask, not taking the address of InlineASM, etc.
>>>  
>>> A quick example of semantic equivalence:
>>> %1 = add i32 1, i32 2
>>> %2 = add i32 2, i32 3
>>>  
>>> These two instructions are not identical because the values of the
operands are not identical. They are, however, semantically equivalent because
they both perform the add operation.
>>> This can be seen by simply removing the operand values used in the
calculations:
>>> %1 = add i32 , i32
>>> %2 = add i32 , i32
>>>  
>>> Occurrence Verification:
>>>  
>>> -        Machine Outliner
>>>    At the post RA level you don’t need to do any kind of special
verification for candidate occurrences because you don’t have to deal with the
concept of inputs.
>>>  
>>> -        IR Outliner
>>>   At the IR/preRA level we need to do complex verification to make
sure that the occurrences within a candidate have the same internal inputs. If
two occurrences have different internal inputs then we need some form of control
flow to maintain correctness. By internal inputs I mean the operands of
instructions that come from an instruction within the occurrence, e.g.
>>>  
>>> %2 = …
>>> // Start outlining occurrence.
>>> %3 = …
>>> %4 = sub %3, %2 // The first operand is an internal input, the
second is external.
>>>  
>>> If there is any confusion about why we need control flow for
internal inputs I am more than happy to provide examples and more detailed
explanations.
>>>  
>>> Aside from internal inputs we also need to verify that the
functions we are outlining from have compatible attributes.
>>>  
>>> Cost Modeling:
>>>  
>>> -        Machine Outliner
>>> At the MIR level the cost information is extremely accurate. So
cost modeling is composed of effectively counting the number of instructions and
adding some frame/setup cost.
>>>  
>>> -        IR Outliner
>>> At the IR level we are working with estimates for the costs of
certain instructions. We try to match the IR cost to the MIR cost as closely as
possible and in practice we can get fairly close(Graph[1]).
>>>    Taking this a step further we need to estimate the cost/setup of
having x amount of parameters and y outputs, as well as the register pressure
from both the call and the potentially outlined function.
>>>  
>>> Parameterization Optimizations:
>>>  
>>> -        Machine Outliner
>>> The Machine outliner uses exact equivalence, which does not allow
for any form of parameterization.
>>>  
>>> -        IR Outliner
>>>           	Being at the IR level requires us to tackle
parameterization, which then brings several optimizations to help lower the cost
of parameterizing a sequence.
>>>  
>>> * Constant Folding
>>>           	The IR outliner will identify constant inputs and fold
them.
>>>  
>>> * Congruent Input Condensing
>>>           	The outliner identifies the congruent sets of parameters
for a function. Example:
>>>           	void fn(int, int); ->  void fn(int);
>>>           	fn(1, 1);          	->  fn(1);
>>>           	fn(%1, %1);   	->  fn(%1);
>>> Parameters 1 and 2 were found to be the same for each callsite of
the function, so we condensed the congruent parameters.
>>>  
>>> * Input Partitioning
>>>           	The outliner partitions candidates that have a parameter
that can be constant folded. Example:
>>>           	fn(1);
>>>           	fn(1);
>>>           	fn(%1);
>>> Occurrences 1 and 2 in the above candidate can have parameter 1
folded. We create a new candidate containing just occurrences 1 and 2 as it may
be more profitable than the original candidate.
>>>  
>>> * Constant int condensing
>>>           	The outliner identifies constant int parameters and
checks to see if, for each occurrence, they are an equal distance from other
constant int parameters. If so it removes all but one of the parameters and
represents the others as an add from the base. Example:
>>>  
>>>           	void fn(int a, int b);
>>>           	fn(1, 2);
>>>           	fn(3, 4);
>>>  
>>> In the above, parameters 1 and 2 are always a distance of 1 apart.
We can redefine our function as:
>>>           	void fn(int a) {
>>>           	  int b = a + 1;
>>>           	  …
>>>           	}
>>>  
>>> Register Usage:
>>> 
>>> -        Machine Outliner
>>> The MO works post RA with exact equivalence, so the most it will
compute is if it needs to save the link register on arm64.
>>>  
>>> -        IR Outliner
>>> The IR outliner needs to compute register usage for the new
outlined function as well as the usage after generating a function call with x
parameters and y outputs at each program point z.
>>>  
>>> Outlining:
>>>  
>>> -        Machine Outliner
>>>           	At the MIR level we clone the outlined instructions into
a new function, create some prologue/epilogue for the function, and then
generate a call.
>>>  
>>> -        IR Outliner
>>>           	At the IR level we also have to handle the
parameters/outputs of the candidate. Here we need to merge all of the metadata
of outlined instructions/outlined functions. We also need to identify congruent
sets of parameters between call sites and then folding the amount of parameters
that are needed for the call.
>>>  
>>> Suffix Array vs Suffix Tree+LCP:
>>> 
>>>           	The two structures should compute the same result, but
there is a non obvious benefit that we get from the suffix array. With the
suffix array approach we identify candidates that shares common occurrences
albeit with a different length. This is very useful for complex
verification/analysis, e.g. at the IR or pre RA level. This allows us to cache
the work when we calculating inputs or verifying the internal inputs of
occurrences. Although this won't be an issue if/when we switch to a common
interface for candidate selection.
>>>  
>>>  
>>> ---- A replacement for the Machine Outliner? Not exactly ----
>>>  
>>> The IR outliner was never intended as a replacement for the machine
outliner and the two can coexist. The outliners tend to catch very different
cases: the machine outliner tends to favor very small candidate lengths. Using a
build of llvm-tblgen, the machine outliner gets ~52% of its benefit from
outlined functions of 2-3 instructions. The IR outliner tends to favor large
candidate lengths(2-20+), often composed of function calls. 52% of the benefit
for the IR outliner in the llvm-tblgen example is found in outlined functions
with final lengths up to 17. Data for example runs of both can be found in the
graph data file and is summarized in Graph[2].
>>>  
>>>   Included in the performance data are metrics showing the
performance of using both the IR outliner and machine outliner. The data
indicates that you can achieve up to, and exceed, 2% reduction of both geomean
and total size by using both.
>>>  
>>> ---- Pros/Cons of IR----
>>> 
>>> The current algorithm is implemented at the IR level, but there are
trade offs to placing this transformation anywhere in the
pipeline(IR/preRa/postRA).
>>>  
>>> -- Less Precise Cost Modeling:
>>>  Being at the IR level creates a need to estimate the size cost of
any given instruction.
>>> - How much does this imprecision affect the benefit estimation?
>>> - Included in the data : Graph[1]: is the difference between our
estimated function size and the actual size in the binary. It shows that we get
very close and tend to be on the conservative side.
>>> - Estimation causes the IR outliner to be conservative. Which means
that we are losing out on potential benefit by overestimating cost.
>>> 
>>> -- Higher Level of Abstraction:
>>> - The outliners are essentially string matching algorithms. Being
at a higher level of abstraction naturally gives more opportunities for
equivalence. As an example, call instructions are handled naturally at the IR
level.
>>> - Will a preRA outliner be able to have the same relaxation in
congruence matching? E.g will it be able to match tail and non tail function
calls?
>>> - Being at the IR level means that we lose out on some instruction
lowering idioms, e.g. constant expressions, bitwise rotation([shl, lshr, or]
-> [rot]), etc.
>>> - This is evident in the results for test suite for aarch64, in
which the machine outliner outperforms the IR outliner due in part to the large
amount of global accesses in the tests.
>>> 
>>> -- Maintainability:
>>> - The IR level in general is much more maintainable.
>> Why so? How did you measure that? What is your measure for
“maintainable?"
>> 
>> The IR level, currently, has much better documentation, is more mature,
and AFAIK most developers are familiar with it. It's definitely subjective
to some degree.
>>  
>> 
>>> - We don’t have to be as conservative about certain ABI
characteristics. This allows for the IR outliner to work without the need for
any extra work(special options) from the users. For example, the machine
outliner requires ‘noredzone’ but the IR outliner does not.
>>> 
>>> -- Pipeline Flexibility:
>>> - As shown in the performance data below, we can get up to 2x
performance by working pre function simplification. Though working pre
simplification means the outliner must gamble between the benefits of outlining
vs simplification.
>>> 
>>> -- Loss of control:
>>> - The machine level can have more control over the outlining
process. We could have optimized parameterization, alignment handling, etc.
>>>  
>>> ---- Adapting the algorithm to pre-RA IR ----
>>> The analysis portion of the IR outliner is already IR agnostic for
the most part. It works on indices into the congruency vector for instructions
and their inputs/outputs. This would mean that a preRA outliner would only have
to define the MIR specific portions: Congruency detection, cost analysis,
parameter/output optimizations, and the outlining of beneficial candidates.
>> That should apply the other way around too: take the MO outliner and
adapt. No?
>> 
>> This is one of the things that the comparison helps to illustrate. If
you take the utilities that I mentioned above and port the current MO outliner
to use them, the resultant file is < 300 lines. It works on exact
equivalence, so all of the interesting parts of the algorithm, besides legality
detection and frame cost, are already taken care of by register allocation. The
only remaining parts are the value numbering and outlining, but those are IR
specific and have to be rewritten anyways.
> 
> Is this how you propose to move forward with the commit? Start with the
utilities, then work with Jessica on the MO port, and finally work on IR
equivalence, signatures etc.? This would seem like a sensitive approach to me,
build upon what is there already and thus demonstrate the robustness of the new
framework
>>   
>>  The IR implementation started at roughly this point, but exact
equivalence at the IR level doesn't really amount to much. This outliner
adds the logic for verification/analysis/optimization for when we don't have
exact equivalence. It's better to start from here because these are the real
problems that are going to have to be solved if an outliner is to exist anywhere
other than post RA.
> 
> I understand your urge since the problems are hard and interesting, but
they are also solvable. Sharing (big) parts of the source base among the
outliners reduces maintenance overhead. This is easier to achieve when tackled
from the start.
>> 
>> Thanks,
>>  River Riddle
>> 
>>  
>>> 
>>> -- Implementation -- 
>>>
https://github.com/River707/llvm/blob/outliner/lib/Transforms/IPO/CodeSizeOutliner.cpp
<https://github.com/River707/llvm/blob/outliner/lib/Transforms/IPO/CodeSizeOutliner.cpp>
>>> All feedback/comments/discussion welcome and appreciated!
>>> 
>>> Thanks,
>>>   River Riddle
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> 
>> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170927/5c96dc31/attachment-0001.html>

Davide Italiano via llvm-dev

2017-Sep-27 22:23 UTC

head link

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> I think that, given previous discussion on the topic, we might want a split
> like this:
>
> (1) Search structure
>
> Suffix tree or suffix array.
>
> (2) Numbering/mapping/congruence scheme
>
> Every outliner should implement a function that maps instructions (or
> whatever structure you want to outline, crazy thoughts…) to integers. That
> should be passed to the search structure, which will return a list of
> repeated sequences.
>
> The MachineOutliner currently has an “InstructionMapper” struct in it. I
> think we can probably build on that struct so that we can choose different
> mapping schemes.
>
> (3) Cost model/candidate selection scheme
>
> Every outliner should implement a function that returns the outlining cost
> or benefit of a candidate. In the MachineOutliner, this is a joint effort
> between the pass and the target.
>
> (4) Outlining scheme
>
> Every outliner should implement a function that actually performs
outlining.
> That is, we should have a function that replaces each instance of a
repeated
> sequence of instructions with a call to a newly-created function. In the
> MachineOutliner, the method by which we do this is decided on by the
target.
>
> So, we at the very least might want an interface that implements something
> like
>
> unsigned mapInstructions(…);
> unsigned getOutliningCost(…);
> unsigned findCandidates(…);
> bool outline(…);
>
Hi Jessica, I tend to agree we should try to maximize code reuse in
this context, and I think yours is a good step forward. In particular,
I'm particularly confident the `findCandidates()` bits should be
split.
That said, do we really want encapsulate the logic for finding
candidates into an interface? It's unclear whether it should live but
it seems much more akin to the stuff living in Transforms/Utils than a
proper interface.
So, IMHO it's much more suitable for an helper/utility.

For what it concerns `mapInstructions()` [please correct me if I'm wrong].
This bit of the interface should be responsible for value numbering
the instructions, as far as I can tell.
To the best of my understanding the way your outliner value numbers
using a nominal approach, i.e. two instructions get the same value
number if their string representation is the same (or something like
that).

With the notion above,
xorl %eax %eax
movl %eax, $0

aren't really considered equivalent (although, in practice, they are).

The way I imagined outlining to work at the IR level (and how I once
discussed with Dan) is leveraging the result of a value number
analysis (NewGVN) to assign numbers.

While the problem NewGVN solves is that of finding all the Herbrand
equivalences (structural equivalence, same operation and operand
structurally congruent), it also does a canonicalization step calling
InstSimplify to canonicalize (so, in this sense it catches more
stuffs, e.g.:

%x = add %a, 0
%y = sub %b, 0
not equivalent

instSimplify(%y) -> add %a, 0
now they're equivalent.

Note: The current IR outlliner doesn't really use the approach I just
described, yet, but it's not hard to imagine extending (or rewriting
it to use a real Global Value number analysis to get instructions that
are equivalent, for some definition of  structural equivalency)

So, I wonder whether you plan to enhance the logic for value numbering
in your current scheme. If you do, and you plan to share more code
between IR and MI outliner, then it's a good idea I guess to have a
common interface (I'd love to hear your thoughts on this one).

About the `getOutliningCost()` API, what do you think to do in the MI outliner?
Currently, if I remember correctly, (almost) all instructions have
unit cost except for something.
It's not unreasonable to think this model will evolve. A relatively
obvious improvement might be that of taking in account the length
encoding for the target architecture when outlining. The IR outliner
uses TargetTransformInfo to do estimation costs. I think we may want
to evaluate whether (or not) it's feasible to have a base class for
the cost model that doesn't introduce too much boilerplate, or whether
we should bite the bullet and pay a small cost in duplication for this
logic across the two outliners.

Thanks,

--
Davide

Jessica Paquette via llvm-dev

2017-Sep-27 23:01 UTC

head link

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

Hi Davide,

Thanks! I think that this is a really great thing to be pushing forward in
general.
> That said, do we really want encapsulate the logic for finding
> candidates into an interface? It's unclear whether it should live but
> it seems much more akin to the stuff living in Transforms/Utils than a
> proper interface.
> So, IMHO it's much more suitable for an helper/utility.
This sounds good to me. We could probably simplify it down to a utility that
either

- Finds all the repeated sequences in the program and returns them to the given
outliner, which then applies its cost model to it
- Takes in a (possibly optional) cost model function, finds all repeated
sequences that satisfy that model, and then returns those sequences. If no model
is provided, then it will return all sequences.
> For what it concerns `mapInstructions()` [please correct me if I'm
wrong].
> This bit of the interface should be responsible for value numbering
> the instructions, as far as I can tell.
That’s correct. 
> To the best of my understanding the way your outliner value numbers
> using a nominal approach, i.e. two instructions get the same value
> number if their string representation is the same (or something like
> that).

Yep. They have to match exactly.
> So, I wonder whether you plan to enhance the logic for value numbering
> in your current scheme. If you do, and you plan to share more code
> between IR and MI outliner, then it's a good idea I guess to have a
> common interface (I'd love to hear your thoughts on this one).
Absolutely. There are plenty of opportunities that the MachineOutliner misses
due to simple things like register allocation, on top of things like the example
you gave. A more powerful scheme, and moving the MachineOutliner pre-regalloc
would be highly beneficial. Having a common interface here would be great. (This
would be a really interesting way to analyze the overall structure of programs
as well.)

- Jessica
> On Sep 27, 2017, at 3:23 PM, Davide Italiano <davide at freebsd.org>
wrote:
> 
> On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> 
>> I think that, given previous discussion on the topic, we might want a
split
>> like this:
>> 
>> (1) Search structure
>> 
>> Suffix tree or suffix array.
>> 
>> (2) Numbering/mapping/congruence scheme
>> 
>> Every outliner should implement a function that maps instructions (or
>> whatever structure you want to outline, crazy thoughts…) to integers.
That
>> should be passed to the search structure, which will return a list of
>> repeated sequences.
>> 
>> The MachineOutliner currently has an “InstructionMapper” struct in it.
I
>> think we can probably build on that struct so that we can choose
different
>> mapping schemes.
>> 
>> (3) Cost model/candidate selection scheme
>> 
>> Every outliner should implement a function that returns the outlining
cost
>> or benefit of a candidate. In the MachineOutliner, this is a joint
effort
>> between the pass and the target.
>> 
>> (4) Outlining scheme
>> 
>> Every outliner should implement a function that actually performs
outlining.
>> That is, we should have a function that replaces each instance of a
repeated
>> sequence of instructions with a call to a newly-created function. In
the
>> MachineOutliner, the method by which we do this is decided on by the
target.
>> 
>> So, we at the very least might want an interface that implements
something
>> like
>> 
>> unsigned mapInstructions(…);
>> unsigned getOutliningCost(…);
>> unsigned findCandidates(…);
>> bool outline(…);
>> 
> 
> Hi Jessica, I tend to agree we should try to maximize code reuse in
> this context, and I think yours is a good step forward. In particular,
> I'm particularly confident the `findCandidates()` bits should be
> split.
> That said, do we really want encapsulate the logic for finding
> candidates into an interface? It's unclear whether it should live but
> it seems much more akin to the stuff living in Transforms/Utils than a
> proper interface.
> So, IMHO it's much more suitable for an helper/utility.
> 
> For what it concerns `mapInstructions()` [please correct me if I'm
wrong].
> This bit of the interface should be responsible for value numbering
> the instructions, as far as I can tell.
> To the best of my understanding the way your outliner value numbers
> using a nominal approach, i.e. two instructions get the same value
> number if their string representation is the same (or something like
> that).
> 
> With the notion above,
> xorl %eax %eax
> movl %eax, $0
> 
> aren't really considered equivalent (although, in practice, they are).
> 
> The way I imagined outlining to work at the IR level (and how I once
> discussed with Dan) is leveraging the result of a value number
> analysis (NewGVN) to assign numbers.
> 
> While the problem NewGVN solves is that of finding all the Herbrand
> equivalences (structural equivalence, same operation and operand
> structurally congruent), it also does a canonicalization step calling
> InstSimplify to canonicalize (so, in this sense it catches more
> stuffs, e.g.:
> 
> %x = add %a, 0
> %y = sub %b, 0
> not equivalent
> 
> instSimplify(%y) -> add %a, 0
> now they're equivalent.
> 
> Note: The current IR outlliner doesn't really use the approach I just
> described, yet, but it's not hard to imagine extending (or rewriting
> it to use a real Global Value number analysis to get instructions that
> are equivalent, for some definition of  structural equivalency)
> 
> So, I wonder whether you plan to enhance the logic for value numbering
> in your current scheme. If you do, and you plan to share more code
> between IR and MI outliner, then it's a good idea I guess to have a
> common interface (I'd love to hear your thoughts on this one).
> 
> About the `getOutliningCost()` API, what do you think to do in the MI
outliner?
> Currently, if I remember correctly, (almost) all instructions have
> unit cost except for something.
> It's not unreasonable to think this model will evolve. A relatively
> obvious improvement might be that of taking in account the length
> encoding for the target architecture when outlining. The IR outliner
> uses TargetTransformInfo to do estimation costs. I think we may want
> to evaluate whether (or not) it's feasible to have a base class for
> the cost model that doesn't introduce too much boilerplate, or whether
> we should bite the bullet and pay a small cost in duplication for this
> logic across the two outliners.
> 
> Thanks,
> 
> --
> Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170927/533f5596/attachment.html>

Matthias Braun via llvm-dev

2017-Sep-28 01:07 UTC

head link

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

> On Sep 27, 2017, at 3:23 PM, Davide Italiano via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>> 
>> I think that, given previous discussion on the topic, we might want a
split
>> like this:
>> 
>> (1) Search structure
>> 
>> Suffix tree or suffix array.
>> 
>> (2) Numbering/mapping/congruence scheme
>> 
>> Every outliner should implement a function that maps instructions (or
>> whatever structure you want to outline, crazy thoughts…) to integers.
That
>> should be passed to the search structure, which will return a list of
>> repeated sequences.
>> 
>> The MachineOutliner currently has an “InstructionMapper” struct in it.
I
>> think we can probably build on that struct so that we can choose
different
>> mapping schemes.
>> 
>> (3) Cost model/candidate selection scheme
>> 
>> Every outliner should implement a function that returns the outlining
cost
>> or benefit of a candidate. In the MachineOutliner, this is a joint
effort
>> between the pass and the target.
>> 
>> (4) Outlining scheme
>> 
>> Every outliner should implement a function that actually performs
outlining.
>> That is, we should have a function that replaces each instance of a
repeated
>> sequence of instructions with a call to a newly-created function. In
the
>> MachineOutliner, the method by which we do this is decided on by the
target.
>> 
>> So, we at the very least might want an interface that implements
something
>> like
>> 
>> unsigned mapInstructions(…);
>> unsigned getOutliningCost(…);
>> unsigned findCandidates(…);
>> bool outline(…);
>> 
> 
> Hi Jessica, I tend to agree we should try to maximize code reuse in
> this context, and I think yours is a good step forward. In particular,
> I'm particularly confident the `findCandidates()` bits should be
> split.
> That said, do we really want encapsulate the logic for finding
> candidates into an interface? It's unclear whether it should live but
> it seems much more akin to the stuff living in Transforms/Utils than a
> proper interface.
> So, IMHO it's much more suitable for an helper/utility.
> 
> For what it concerns `mapInstructions()` [please correct me if I'm
wrong].
> This bit of the interface should be responsible for value numbering
> the instructions, as far as I can tell.
> To the best of my understanding the way your outliner value numbers
> using a nominal approach, i.e. two instructions get the same value
> number if their string representation is the same (or something like
> that).
> 
> With the notion above,
> xorl %eax %eax
> movl %eax, $0
> 
> aren't really considered equivalent (although, in practice, they are).
> 
> The way I imagined outlining to work at the IR level (and how I once
> discussed with Dan) is leveraging the result of a value number
> analysis (NewGVN) to assign numbers.
> 
> While the problem NewGVN solves is that of finding all the Herbrand
> equivalences (structural equivalence, same operation and operand
> structurally congruent), it also does a canonicalization step calling
> InstSimplify to canonicalize (so, in this sense it catches more
> stuffs, e.g.:
> 
> %x = add %a, 0
> %y = sub %b, 0
> not equivalent
> 
> instSimplify(%y) -> add %a, 0
> now they're equivalent.
> 
> Note: The current IR outlliner doesn't really use the approach I just
> described, yet, but it's not hard to imagine extending (or rewriting
> it to use a real Global Value number analysis to get instructions that
> are equivalent, for some definition of  structural equivalency)I would also note that we already have InstCombine, GVN and other passes. Right
now I see little value in teaching the outliner about those equivalences if we
can just rely on earlier passes to normalize. This case also doesn't look
like a pass ordering problems as I don't see the outliner enabling further
optimizations in other passes.

- Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170927/af81446e/attachment-0001.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Sep 2017 - [RFC] PT.2 Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

[llvm-dev] [RFC] PT.2 Add IR level interprocedural outliner for code size.

Maybe Matching Threads