Andrey Bokhanko via llvm-dev
2016-Aug-30 10:28 UTC
[llvm-dev] [RFC] Interprocedural MIR-level outlining pass
Daniel,
OK, I see your point.
My understanding was that VNDAG is an internal representation, not meant to
be used by VN's customers (that should really care for classes of
congruency, nothing more). Also, I thought that VN's results are available
for a single function at a time. It seems that NewGVN provides access to
VNDAGs computed for the whole program -- which enables its usage in
Jessica's pass.
Still, some issues remain. I wonder what leaf nodes for reads of "a"
and
"b" values contain? If they have VN numbers (as the "Efficient
SSA" paper
says) and VNDAG comparator compares them (for leafs), then the best one can
do is this:
int outlined(int arg1, int arg2)
{
return arg1 + arg2;
}
int foo() {
arg1 = a;
arg2 = b;
return outlined(arg1, arg2);
}
int bar() {
arg1 = a;
arg2 = b;
return outlined(arg1, arg2);
}
that hardly helps.
If, on the other hand, VNDAG's comparator compares variable names and
doesn't pick classes of congruency at all, how this relates to Value
Numbering? SSA can be used for this purpose just as well.
I tend to agree with Hal -- value numbering computes equivalent *values*,
while this pass needs equivalent sequences of *instructions*.
Yours,
Andrey
On Mon, Aug 29, 2016 at 10:47 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>
> On Mon, Aug 29, 2016 at 12:35 PM, Andrey Bokhanko <
> andreybokhanko at gmail.com> wrote:
>
>> Daniel,
>>
>> I wonder what the NewGVN would generate for the following C code:
>>
>> int a, b;
>>
>> int foo() {
>> return a + b;
>> }
>>
>> int bar() {
>> return a + b;
>> }
>>
>> ?
>>
>> Obviously, the expressions would be the same ("value1 +
value2"), but a
>> single operator is not worthy to be outlined.
>>
>
>
> Sure.
>
>
>
>> What classes of congruency would be assigned to operands?
>>
>
>
> GVN is not interprocedural, so comparing gvn congruencey results (as
> opposed to functions) between functions is udefined.
>
>
>> The same for both reads of "a" and "b"? If yes, it
would be incorrect, as
>> obviously, these two additions are not guaranteed to be semantically
>> equivalent (a's and b's values can be changed in-between).
>>
>
>
>
>>
>> Also note that one should be able to compare VN's results from
different
>> functions.
>>
>
> I'm not sure why you think you need to do this. Youu need to be able
to
> compare the abstract VNDAG's, and you can.
> The values of those abstract functions are just parameter bindings you
> pass to the outlined function.
>
> You need to be able to compare congruency if you are trying to *eliminate*
> usages because they already exists.
>
> It is perfectly legal to outline both examples into a single function and
> call that. It will give correct answers.
>
>
> IE
>
> int outlined()
> {
> return a + b;
> }
>
> int foo() {
> return outlined();
> }
>
> int bar() {
> return outlined();
> }
>
>
>
>> Looks like an inter-procedural VN is required.
>>
>> I don't see why, perhaps you could expand?
>
>
>> Yours,
>> Andrey
>>
>>
>> On Sat, Aug 27, 2016 at 9:28 AM, Daniel Berlin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Fri, Aug 26, 2016 at 9:44 PM, Hal Finkel <hfinkel at
anl.gov> wrote:
>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From: *"Daniel Berlin" <dberlin at
dberlin.org>
>>>> *To: *"Quentin Colombet" <qcolombet at
apple.com>
>>>> *Cc: *"Hal Finkel" <hfinkel at anl.gov>,
"llvm-dev" <
>>>> llvm-dev at lists.llvm.org>
>>>> *Sent: *Friday, August 26, 2016 11:06:56 PM
>>>> *Subject: *Re: [llvm-dev] [RFC] Interprocedural MIR-level
outlining
>>>> pass
>>>>
>>>> FWIW: I'm with quentin. Without a good value numbering
analysis, this
>>>> is a hard problem.
>>>>
>>>> How exactly does value numbering help here? This problem seems
to be
>>>> about finding structurally-similar parts of the computations of
different
>>>> values, not identical values.
>>>>
>>> Note, first, the optimization we are talking about is not
outlining, at
>>> least, as far as i've ever heard it.
>>>
>>> function outlining/etc is about removing cold regions of functions
to
>>> separate functions to make the hot portions smaller, more
inlinable, etc.
>>>
>>> What is being talked about here is a variant of identical code
folding,
>>> e.g., http://research.google.com/pubs/pub36912.html and it's
variants.
>>>
>>> Where identical varies (and is based on semantic equivalence).
Variants
>>> where you extract portions partially to merge functions are pretty
much the
>>> same optimization :)
>>>
>>> Here, value numbering helps because it gives you value expressions.
>>>
>>> It tells you:
>>>
>>> This operation is "add V1, V2".
>>> It does not matter what V1 and V2 are, they just "are abstract
things".
>>> We know if these abstract things are equivalent or not.
>>>
>>> More relevantly, these expressions form a value number dag.
>>>
>>> That is, they represent the computations being computed only in
terms of
>>> operators, abstract variables, and other parts of the VNDAG
>>>
>>> The *actual values* of those computations, numbers, etc, do not
matter
>>> (IE the abstract value numbers have some set of things *in the
program*
>>> that are in the set of things equivalent to that value number).
>>>
>>> It is "are these semantically the same operations in the same
order",
>>> regardless of the value of those operations.
>>> See, for example, the DAGS generated by:
>>>
>>> https://www.researchgate.net/publication/221323142_An_Effici
>>> ent_SSA-Based_Algorithm_for_Complete_Global_Value_Numbering
>>>
>>> Again, this is not something current GVN does, but NewGVN (and
other
>>> GVN's do) can give you.
>>>
>>> The question of what portions of a function you can merge are
variants
>>> of common subgraph and graph isomorphism problems on a VNDAG.
>>>
>>> (because in the end, it doesn't matter where the computations
are, it
>>> matters what they abstractly compute, and what they depend on)
>>>
>>>
>>>
>>>> It seems much closer to the analysis necessary for SLP
vectorization
>>>> than value numbering, as such.
>>>>
>>>
>>>
>>>> I think we might be able to use value numbering to help with
subset of
>>>> SLP vectorization, but only because we can define some kind of
equivalence
>>>> relation on consecutive memory accesses (and similar). What you
need for
>>>> this outlining seems more akin to the general SLP-vectorization
case. This
>>>> might just be the maximal common subgraph problem.
>>>>
>>>
>>> If i have an add in a loop, and it's VN equivalent to an add
outside a
>>> loop in some other function, i can still merge the functions :)
>>>
>>> It's how the value is generated and used that matters, not the
structure
>>> of the program.
>>>
>>> So you can't just do it on a regular CFG or anything like that
if you
>>> want good results.
>>>
>>>>
>>>> -Hal
>>>>
>>>>
>>>> GVN as it exists now doesn't really provide what you want,
and in fact,
>>>> doesn't value number a lot of instruction types (including,
for example,
>>>> loads, stores, calls, phis, etc). It also relies on ordering,
and more
>>>> importantly, it relies on *not* being a strong analysis. If you
were to
>>>> stop it from eliminating as it went, it would catch maybe 10%
of the
>>>> redundancies it does now. So what you want is "i want to
know what
>>>> globally is the same", it doesn't really answer that
well.
>>>>
>>>> Doing the analysis Quentin wants is pretty trivial in NewGVN
(analysis
>>>> and elimination are split, everything is broken into congruence
classes
>>>> explicitly, each congruence class has an id, you could sub in
that id as
>>>> the value for the terminated string), but i'd agree that
GVN as it exists
>>>> now will not do what they want, and would be pretty hard to
make work well.
>>>>
>>>> (FWIW: Davide Italiano has started breaking up newgvn into
submittable
>>>> pieces, and is pretty far along to having a first patch set I
believe)
>>>>
>>>>
>>>> On Fri, Aug 26, 2016 at 4:54 PM, Quentin Colombet via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I let Jessica give more details but here are some insights.
>>>>>
>>>>> MIR offers a straight forward way to model benefits,
because we know
>>>>> which instructions we remove and which one we add and there
are no overhead
>>>>> of setting up parameters. Indeed, since the coloring will
be the same
>>>>> between the different outlining candidates, the call is
just a jump
>>>>> somewhere else. We do not have to worry about the impact of
parameter
>>>>> passing and ABI.
>>>>>
>>>>> So basically, better cost model. That's one part of the
story.
>>>>>
>>>>> The other part is at the LLVM IR level or before register
allocation
>>>>> identifying similar code sequence is much harder, at least
with a suffix
>>>>> tree like algorithm. Basically the problem is how do we
name our
>>>>> instructions such that we can match them.
>>>>> Let me take an example.
>>>>> foo() {
>>>>> /* bunch of code */
>>>>> a = add b, c;
>>>>> d = add e, f;
>>>>> }
>>>>>
>>>>> bar() {
>>>>> d = add e, g;
>>>>> f = add c, w;
>>>>> }
>>>>>
>>>>> With proper renaming, we can outline both adds in one
function. The
>>>>> difficulty is to recognize that they are semantically
equivalent to give
>>>>> them the same identifier in the suffix tree. I won’t get
into the details
>>>>> but it gets tricky quickly. We were thinking of reusing GVN
to have such
>>>>> identifier if we wanted to do the outlining at IR level but
solving this
>>>>> problem is hard.
>>>>>
>>>>> By running after regalloc, we basically have a heuristic
that does
>>>>> this naming for us.
>>>>>
>>>>> Cheers,
>>>>> -Quentin
>>>>>
>>>>>
>>>>> On Aug 26, 2016, at 3:01 PM, Hal Finkel via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *From: *"Kevin Choi" <code.kchoi at
gmail.com>
>>>>> *To: *"Hal Finkel" <hfinkel at anl.gov>
>>>>> *Cc: *"llvm-dev" <llvm-dev at
lists.llvm.org>
>>>>> *Sent: *Friday, August 26, 2016 4:55:29 PM
>>>>> *Subject: *Re: [llvm-dev] [RFC] Interprocedural MIR-level
outlining
>>>>> pass
>>>>>
>>>>> I think the "Motivation" section explained that.
>>>>>
>>>>> I don't think it explained it.
>>>>>
>>>>> I too first thought about "why not at IR?" but
the reason looks like
>>>>> MIR, post-RA has the most accurate heuristics (best way to
know looks like
>>>>> actually getting there).
>>>>>
>>>>> But also, potentially, the fewest opportunities. That's
why I'm
>>>>> curious about the motivation - the trade offs are not
obvious to me.
>>>>>
>>>>> -Hal
>>>>>
>>>>>
>>>>>
>>>>> Do you know if there is any experimental pass that relies
on deriving
>>>>> heuristics by a feedback loop after letting, ie. a
duplicate
>>>>> module/function/block continue past?
>>>>>
>>>>> Regards,
>>>>> Kevin
>>>>>
>>>>> On 26 August 2016 at 14:36, Hal Finkel via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi Jessica,
>>>>>>
>>>>>> This is quite interesting.
>>>>>>
>>>>>> Can you comment on why you started by doing this at the
MI level, as
>>>>>> opposed to the IR level. And at the MI level, why after
RA instead of
>>>>>> before RA?
>>>>>>
>>>>>> Perhaps the first question is another way of asking
about how
>>>>>> accurately we can model call-site code size at the IR
level?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Hal
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> > From: "Jessica Paquette via llvm-dev"
<llvm-dev at lists.llvm.org>
>>>>>> > To: llvm-dev at lists.llvm.org
>>>>>> > Sent: Friday, August 26, 2016 4:26:09 PM
>>>>>> > Subject: [llvm-dev] [RFC] Interprocedural
MIR-level outlining pass
>>>>>> >
>>>>>> > Hi everyone,
>>>>>> >
>>>>>> > Since I haven't said anything on the mailing
list before, a quick
>>>>>> > introduction. I'm an intern at Apple, and over
the summer I
>>>>>> > implemented a
>>>>>> > prototype for an outlining pass for code size in
LLVM. Now I'm
>>>>>> > seeking to
>>>>>> > eventually upstream it. I have the preliminary
code on GitHub right
>>>>>> > now,
>>>>>> > but it's still very prototypical (see the code
section).
>>>>>> >
>>>>>> >
===============================>>>>>> > Motivation
>>>>>> >
===============================>>>>>> > The goal of the
internship was to create an interprocedural pass
>>>>>> that
>>>>>> > would reduce code size as much as possible,
perhaps at the cost of
>>>>>> > some
>>>>>> > performance. This would be useful to, say,
embedded programmers who
>>>>>> > only
>>>>>> > have a few kilobytes to work with and a
substantial amount of code
>>>>>> to
>>>>>> > fit
>>>>>> > in that space.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Approach and
Initial Results
>>>>>> >
===============================>>>>>> > To do this, we
chose to implement an outliner. Outliners find
>>>>>> > sequences of
>>>>>> > instructions which would be better off as a
function call, by some
>>>>>> > measure
>>>>>> > of "better". In this case, the measure
of "better" is "makes code
>>>>>> > smaller".
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Results
>>>>>> >
===============================>>>>>> > These results are
from a fairly recent 64-bit Intel processor, using
>>>>>> > a
>>>>>> > version of Clang equipped with the outliner
prototype versus an
>>>>>> > equivalent
>>>>>> > version of Clang without the outliner.
>>>>>> >
>>>>>> > CODE SIZE REDUCTION
>>>>>> > For tests >=4 Kb in non-outlined size, the
outliner currently
>>>>>> > provides an
>>>>>> > average of 12.94% code size reduction on the LLVM
test suite in
>>>>>> > comparison
>>>>>> > to a default Clang, up to 51.4% code size
reduction. In comparison
>>>>>> to
>>>>>> > a
>>>>>> > Clang with -Oz, the outliner provides an average
of a 1.29% code
>>>>>> size
>>>>>> > reduction, up to a 37% code size reduction. I
believe that the -Oz
>>>>>> > numbers
>>>>>> > can be further improved by better tuning the
outlining cost model.
>>>>>> >
>>>>>> > EXECUTION TIME IMPACT
>>>>>> > On average, the outliner increases execution time
by 2% on the LLVM
>>>>>> > test
>>>>>> > suite, but has been also shown to improve exection
time by up to
>>>>>> 16%.
>>>>>> > These results were from a fairly recent Intel
processor, so the
>>>>>> > results
>>>>>> > may vary. Recent Intel processors have very low
latency for function
>>>>>> > calls, which may impact these results. Execution
time improvements
>>>>>> > are
>>>>>> > likely dependent on the latency of function calls,
instruction
>>>>>> > caching
>>>>>> > behaviour, and the execution frequency of the code
being outlined.
>>>>>> In
>>>>>> > partucular, using a processor with heavy function
call latency will
>>>>>> > likely
>>>>>> > increase execution time overhead.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Implementation
>>>>>> >
===============================>>>>>> > The outliner, in
its current state, is a MachineModulePass. It finds
>>>>>> > *identical* sequences of MIR, after register
allocation, and pulls
>>>>>> > them
>>>>>> > out into their own functions. Thus, it's
effectively assembly-level.
>>>>>> > Ultimately, the algorithm used is general, so it
can sit anywhere,
>>>>>> > but MIR
>>>>>> > was very convenient for the time being.
>>>>>> >
>>>>>> > It requires two data structures.
>>>>>> >
>>>>>> > 1. A generalized suffix tree
>>>>>> > 2. A "terminated string"
>>>>>> >
>>>>>> > 1: The generalized suffix tree is constructed
using Ukkonen's linear
>>>>>> > time
>>>>>> > construction algorithm [1]. They require linear
space and support
>>>>>> > linear-time substring queries. In practice, the
construction time
>>>>>> for
>>>>>> > the
>>>>>> > suffix tree is the most time consuming part, but I
haven't noticed a
>>>>>> > difference in compilation time on, say, 12 MB .ll
files.
>>>>>> >
>>>>>> > 2: To support the suffix tree, we need a
"terminated string." This
>>>>>> is
>>>>>> > a
>>>>>> > generalized string with an unique terminator
character appended to
>>>>>> > the
>>>>>> > end. TerminatedStrings can be built from any type.
>>>>>> >
>>>>>> > The algorithm is then roughly as follows.
>>>>>> >
>>>>>> > 1. For each MachineBasicBlock in the program,
build a
>>>>>> > TerminatedString for
>>>>>> > that block.
>>>>>> > 2. Build a suffix tree for that collection of
strings.
>>>>>> > 3. Query the suffix tree for the longest repeated
substring and
>>>>>> place
>>>>>> > that
>>>>>> > string in a candidate list. Repeat until none are
found.
>>>>>> > 4. Create functions for each candidate.
>>>>>> > 5. Replace each candidate with a call to its
respective function.
>>>>>> >
>>>>>> > Currently, the program itself isn't stored in
the suffix tree, but
>>>>>> > rather
>>>>>> > a "proxy string" of integers. This
isn't necessary at the MIR level,
>>>>>> > but
>>>>>> > may be for an IR level extension of the algorithm.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Challenges
>>>>>> >
===============================>>>>>> > 1) MEMORY
CONSUMPTION
>>>>>> > Given a string of length n, a naive suffix tree
implementation can
>>>>>> > take up
>>>>>> > to 40n bytes of memory. However, this number can
be reduced to 20n
>>>>>> > with a
>>>>>> > bit of work [2]. Currently, the suffix tree stores
the entire
>>>>>> > program,
>>>>>> > including instructions which really ought not to
be outlined, such
>>>>>> as
>>>>>> > returns. These instructions should not be included
in the final
>>>>>> > implementation, but should rather act as
terminators for the
>>>>>> strings.
>>>>>> > This
>>>>>> > will likely curb memory consumption. Suffix trees
have been used in
>>>>>> > the
>>>>>> > past in sliding-window-based compression schemes,
which may serve as
>>>>>> > a
>>>>>> > source of inspiration for reducing memory
overhead.[3]
>>>>>> >
>>>>>> > Nonetheless, the outliner probably shouldn't
be run unless it really
>>>>>> > has
>>>>>> > to be run. It will likely be used mostly in
embedded spaces, where
>>>>>> > the
>>>>>> > programs have to fit into small devices anyway.
Thus, memory
>>>>>> overhead
>>>>>> > for
>>>>>> > the compiler likely won't be a problem. The
outliner should only be
>>>>>> > used
>>>>>> > in -Oz compilations, and possibly should have its
own flag.
>>>>>> >
>>>>>> >
>>>>>> > 2) EXECUTION TIME
>>>>>> > Currently, the outliner isn't tuned for
preventing execution time
>>>>>> > increases. There is an average of a 2% execution
time hit on the
>>>>>> > tests in
>>>>>> > the LLVM test suite, with a few outliers of up to
30%. The outliers
>>>>>> > are
>>>>>> > tests which contain hot loops. The outliner really
ought to be able
>>>>>> > to use
>>>>>> > profiling information and not outline from hot
areas. Another
>>>>>> > suggestion
>>>>>> > people have given me is to add a "never
outline" directive which
>>>>>> > would
>>>>>> > allow the user to say something along the lines of
"this is a hot
>>>>>> > loop,
>>>>>> > please never outline from it".
>>>>>> >
>>>>>> > It's also important to note that these numbers
are coming from a
>>>>>> > fairly
>>>>>> > recent Intel processor.
>>>>>> >
>>>>>> >
>>>>>> > 3) CONSTRAINTS ON INSTRUCTIONS
>>>>>> > The outliner currently won't pull anything out
of functions which
>>>>>> use
>>>>>> > a
>>>>>> > red zone. It also won't pull anything out that
uses the stack,
>>>>>> > instruction
>>>>>> > pointer, uses constant pool indices, CFI indices,
jump table
>>>>>> indices,
>>>>>> > or
>>>>>> > frame indices. This removes many opportunities for
outlining which
>>>>>> > would
>>>>>> > likely be available at a higher level (such as
IR). Thus, there's a
>>>>>> > case
>>>>>> > for moving this up to a higher level.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Additional
Applications
>>>>>> >
===============================>>>>>> > The suffix tree
itself could be used as a tool for finding
>>>>>> > opportunities
>>>>>> > to refactor code. For example, it could recognize
places where the
>>>>>> > user
>>>>>> > likely copied and pasted some code. This could be
run on codebases
>>>>>> to
>>>>>> > find
>>>>>> > areas where people could manually outline things
at the source
>>>>>> level.
>>>>>> >
>>>>>> > Using the terminated string class, it would also
be possible to
>>>>>> > implement
>>>>>> > other string algorithms on code. This may open the
door to new ways
>>>>>> > to
>>>>>> > analyze existing codebases.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Roadmap
>>>>>> >
===============================>>>>>> > The current
outliner is *very* prototypical. The version I would
>>>>>> want
>>>>>> > to
>>>>>> > upstream would be a new implementation. Here's
what I'd like to
>>>>>> > address
>>>>>> > and accomplish.
>>>>>> >
>>>>>> > 1. Ask "what does the LLVM community want
from an outliner" and use
>>>>>> > that
>>>>>> > to drive development of the algorithm.
>>>>>> > 2. Reimplement the outliner, perhaps using a less
memory-intensve
>>>>>> > data
>>>>>> > structure like a suffix array.
>>>>>> > 3. Begin adding features to the algorithm, for
example:
>>>>>> > i. Teaching the algorithm about hot/cold
blocks of code and
>>>>>> > taking
>>>>>> > that into account.
>>>>>> > ii. Simple parameter passing.
>>>>>> > iii. Similar function outlining-- eg, noticing
that two
>>>>>> outlining
>>>>>> > candidates are similar and can be merged into one
function with some
>>>>>> > control flow.
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > Code
>>>>>> >
===============================>>>>>> > Note: This code
requires MachineModulePasses
>>>>>> >
>>>>>> > * Main pass:
>>>>>> >
https://github.com/ornata/llvm/blob/master/lib/CodeGen/Mac
>>>>>> hineOutliner.h
>>>>>> >
>>>>>> > * Suffix tree:
>>>>>> >
https://github.com/ornata/llvm/blob/master/include/llvm/AD
>>>>>> T/SuffixTree.h
>>>>>> >
>>>>>> > * TerminatedString and TerminatedStringList:
>>>>>> >
https://github.com/ornata/llvm/blob/master/include/llvm/AD
>>>>>> T/TerminatedString.h
>>>>>> >
>>>>>> > Here are a couple unit tests for the data
structures.
>>>>>> >
>>>>>> > * Suffix tree unit tests:
>>>>>> >
https://github.com/ornata/llvm/blob/master/unittests/ADT/S
>>>>>> uffixTreeTest.cpp
>>>>>> >
>>>>>> > * TerminatedString unit tests:
>>>>>> >
https://github.com/ornata/llvm/blob/master/unittests/ADT/T
>>>>>> erminatedStringTest.cpp
>>>>>> >
>>>>>> > * TerminatedStringList unit tests:
>>>>>> >
https://github.com/ornata/llvm/blob/master/unittests/ADT/T
>>>>>> erminatedStringListTest.cpp
>>>>>> >
>>>>>> >
>>>>>> >
===============================>>>>>> > References
>>>>>> >
===============================>>>>>> > [1] Ukkonen's
Algorithm:
>>>>>> >
https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
>>>>>> > [2] Suffix Trees and Suffix Arrays:
>>>>>> > http://web.cs.iastate.edu/~cs548/suffix.pdf
>>>>>> > [3] Extended Application of Suffix Trees to Data
Compression:
>>>>>> > http://www.larsson.dogma.net/dccpaper.pdf
>>>>>> >
>>>>>> >
>>>>>> > Thanks for reading,
>>>>>> > Jessica
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > LLVM Developers mailing list
>>>>>> > llvm-dev at lists.llvm.org
>>>>>> >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>> >
>>>>>>
>>>>>> --
>>>>>> Hal Finkel
>>>>>> Assistant Computational Scientist
>>>>>> Leadership Computing Facility
>>>>>> Argonne National Laboratory
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160830/26823d81/attachment-0001.html>
Daniel Berlin via llvm-dev
2016-Aug-30 16:28 UTC
[llvm-dev] [RFC] Interprocedural MIR-level outlining pass
On Tue, Aug 30, 2016 at 3:28 AM, Andrey Bokhanko <andreybokhanko at gmail.com> wrote:> Daniel, > > OK, I see your point. > > My understanding was that VNDAG is an internal representation, not meant > to be used by VN's customers (that should really care for classes of > congruency, nothing more). >Depends on the customers? You provide what your customers want, not decide what they want :)> Also, I thought that VN's results are available for a single function at a > time. It seems that NewGVN provides access to VNDAGs computed for the whole > program -- which enables its usage in Jessica's pass. >It does not currently, this would be trivial to do, however. You are going very very far into the implementation details which very much do not matter from an algorithm design perspective.> > Still, some issues remain. I wonder what leaf nodes for reads of "a" and > "b" values contain? If they have VN numbers (as the "Efficient SSA" paper > says) and VNDAG comparator compares them (for leafs), then the best one can > do is this: > > int outlined(int arg1, int arg2) > { > return arg1 + arg2; > } > > int foo() { > arg1 = a; > arg2 = b; > return outlined(arg1, arg2); > } > > int bar() { > arg1 = a; > arg2 = b; > return outlined(arg1, arg2); > } > > that hardly helps. > >See above. Again, you are going really far into implementation details for no reason i can see. We can do pretty much whatever we want. Remember also, you outlined a single statement function, that is the best you can do in pretty much any legal case.> If, on the other hand, VNDAG's comparator compares variable names and > doesn't pick classes of congruency at all, how this relates to Value > Numbering? SSA can be used for this purpose just as well. > > I tend to agree with Hal -- value numbering computes equivalent *values*, >Sorry, but this is just flat out wrong "A Global Value Numbering(GVN) algorithm is considered to be complete (or precise), if it can detect all Herbrand equivalences among expressions in a program. Two expressions are said to be Herbrand equivalent (or transparent equivalent ), if they are computed by the same operator applied to equivalent operands " This is, AFAIK, precisely what you want. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160830/f8c6aaa8/attachment.html>
Andrey Bokhanko via llvm-dev
2016-Aug-31 12:17 UTC
[llvm-dev] [RFC] Interprocedural MIR-level outlining pass
On Tue, Aug 30, 2016 at 7:28 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> I tend to agree with Hal -- value numbering computes equivalent *values*, >> > Sorry, but this is just flat out wrong > > "A Global Value Numbering(GVN) algorithm is considered to be complete (or > precise), if it can detect all Herbrand equivalences among expressions in a > program. > Two expressions are said to be Herbrand equivalent (or transparent > equivalent ), if they are computed by the same operator applied to > equivalent operands " > > This is, AFAIK, precisely what you want. > >I'm not entirely happy with this definition (IMHO, it's overly restrictive), but this in irrelevant. What relevant is what one considers to be "equivalent operands". Take my example again -- for outlining (Jessica's name) / code folding (your name) optimization reads of "a" and "b" globals are equivalent; for VN and its users they are not. Yours, Andrey -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/0e8161f5/attachment-0001.html>