thr3ads.net - llvm dev - [llvm-dev] [RFC] NewGVN [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Davide Italiano via llvm-dev

2016-Nov-16 06:49 UTC

[llvm-dev] [RFC] NewGVN

Hi,
we would like to propose  a new Global Value Numbering pass in LLVM.
The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
into submittable patches from me). The code has been around for a
while (2012 or before), and we think it's getting ready to be
committed upstream.

### Motivation

To put things into context: my personal motivation for having a new
GVN/PRE algorithm is LTO.
It's not a secret that LLVM is getting slower and slower release after
release, as Rafael discovered/pointed out in March [1] (and probably
many others found out). I personally took a shot at profiling LTO on
many internal/opensource applications (including clang itself) and
noticed that GVN always show in the top-3 passes (and it's generally
the pass where we spend most of the time in the middle-end). There are
cases (extreme) where 90% of the compile time goes in GVN.

Example:
===-------------------------------------------------------------------------==  
... Pass execution timing report ...
===-------------------------------------------------------------------------== 
Total Execution Time: 684.7622 seconds (683.7141 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
  130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
20.1%)  X86 DAG->DAG Instruction Selection
  55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
8.2%)  Function Integration/Inlining
  42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
6.3%)  Global Value Numbering

### Problems in the current GVN

There are some issues in the current GVN infrastructure. A
non-exhaustive list (I'm pretty sure people have their list of
complains, these are the ones I care about the most).
* GVN is very slow for large test cases
* GVN doesn't do a real analysis, instead it eliminates as it goes, so
it's hard to reason about it.
* GVN doesn't perform any phi predication, i.e. doesn't know about phi
nodes, so later passes have to do some extra work to clean up
* There are bugs, e.g. [2] which would require a rewrite of PRE to be fixed.

### NewGVN

The new algorithm implements the ideas described in [3] with some
engineering optimizations by Dan (for example the set of touched
instructions is represented using a Bitvector instead of a set because
it's not uncommon for large functions where a predicate change that
thousands of instructions need to be changed, and we both measured 30%
of the whole pass time spent just modifying the set).
The code pretty much maps what the paper describe so I won't try to
repeat it here =)

Some advantages of NewGVN:
* GVN performs a real analysis (which is now part of the pass itself
but potentially could be split and used as an utility by other
passes). For example, Quentin/Dan pointed out that outlining at the IR
level is hard without a proper value numbering analysis.
* On large testcases, It's faster than current GVN, even though didn't
go through a lot of profiling/optimization lately (I found it up to
50% faster in compile time on some internal benchmarks). There are
some places were we can improve. For example, we spend a lot of time
inside Simplify* functions. Another considerable chunk of the time is
spent inside MemorySSA, but this is going to change once we preserve
MemSSA more aggressively.
* The code is easier to understand (at least to me)
[...]

Some current limitations of NewGVN:
* NewGVN doesn't do everything that the current GVN does. There are
plans to implement missing features and more.
* On small testcases NewGVN may be slower as it does some work upfront
that the current GVN doesn't do. NewGVN is probably less lazy than it
should be, but it didn't matter for us so we didn't consider it a
priority. If somebody cares and finds a case where NewGVN
substantially regresses, speak up.

The initial code can be found here https://reviews.llvm.org/D26224
The current patch includes only the "core" GVN algorithm, i.e. the
expression framework/the logic to perform congruence finding. Other
pieces (e.g. PRE will build on top of that).

My rough plan is:
* Try to get this reviewed/tested and in tree.
* Fix miscompiles/bugs (and performance problems as/if they arise).
* Build other pieces of the algorithm on top of what we have. My
immediate concern will be implementing support for llvm.assume and
load coercion. Dan said he will try to find the time to work on the
predicate handling (actually, there's already a branch
https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).

Please let us know what you think. Any feedback/review comment/testing
is very appreciated!

Thanks!

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
[2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
[3] http://dl.acm.org/citation.cfm?id=512536

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Daniel Berlin via llvm-dev

2016-Nov-16 07:29 UTC

head link

[llvm-dev] [RFC] NewGVN

First, thanks. This is a very very long time coming :)
Second, for those watching, note that pretty much all of the improvements
and missing cases, including load forwarding/coercion, etc, actually are
done.
It's more a matter of cleaning them up, breaking it down, and submitting
it, than "implementing them".
The main experimentation at this point is "can we do it cleaner" not
"can
we do it" :)

It's also important to note that this new GVN also treats loads/stores in a
unified way with scalars, unlike current GVN (which has no load or store
value numbering).

So it will happily discover complex load/store relations (though there is
some improvements we can still make here)
For example:

int vnum_test8(int *data)
{
  int i;
  int stop = data[3];
  int m = data[4];
  int n = m;
  int p;
  for (i=0; i<stop; i++) {
    int k = data[2];
    data[k] = 2;
    data[0] = m - n;
    k = data[1];
    m = m + k;
    n = n + k;
    p = data[0];
  }
  return p;
}

LLVM's current GVN will eliminate a single load here[1]
NewGVN will calculate that m and n are equal, that m-n is 0, that p is 0

It's not quite perfect yet, i haven't fixed store handling, so the
following is missed:
int a;
int *p;
// LLVM is too smart and if we don't do this, realizes *p is a store to
undef
void foo(){
    p = &a;
}
int main(int argc, char **argv) {
  int result;
  foo();
  *p = 2;
  if (argc)
    *p = 2;
  result = *p;
  return result;
}

Here, current LLVM GVN will do nothing, because it can't understand
anything really about the stores.
GCC's GVN will determine result is 2.
NewGVN is not quite that smart yet (it require a little work to what we do
to stores, and value numbering memory ssa versions)

This issue compounds if you have conditional stores of the same value.

So, for example, if you add:

if (i < 30)
  data[0] = 0;

to the first case.

GCC can still determine p is 0.

Currently, NewGVN cannot.

--Dan

On Tue, Nov 15, 2016 at 10:49 PM, Davide Italiano <davide at freebsd.org>
wrote:
> Hi,
> we would like to propose  a new Global Value Numbering pass in LLVM.
> The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
> into submittable patches from me). The code has been around for a
> while (2012 or before), and we think it's getting ready to be
> committed upstream.
>
> ### Motivation
>
> To put things into context: my personal motivation for having a new
> GVN/PRE algorithm is LTO.
> It's not a secret that LLVM is getting slower and slower release after
> release, as Rafael discovered/pointed out in March [1] (and probably
> many others found out). I personally took a shot at profiling LTO on
> many internal/opensource applications (including clang itself) and
> noticed that GVN always show in the top-3 passes (and it's generally
> the pass where we spend most of the time in the middle-end). There are
> cases (extreme) where 90% of the compile time goes in GVN.
>
> Example:
> ===---------------------------------------------------------
> ----------------==>                       ... Pass execution timing
report ...
> ===---------------------------------------------------------
> ----------------==>   Total Execution Time: 684.7622 seconds (683.7141
wall clock)
>
>    ---User Time---   --System Time--   --User+System--   ---Wall
> Time---  --- Name ---
>   130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
> 20.1%)  X86 DAG->DAG Instruction Selection
>   55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
> 8.2%)  Function Integration/Inlining
>   42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
> 6.3%)  Global Value Numbering
>
> ### Problems in the current GVN
>
> There are some issues in the current GVN infrastructure. A
> non-exhaustive list (I'm pretty sure people have their list of
> complains, these are the ones I care about the most).
> * GVN is very slow for large test cases
> * GVN doesn't do a real analysis, instead it eliminates as it goes, so
> it's hard to reason about it.
> * GVN doesn't perform any phi predication, i.e. doesn't know about
phi
> nodes, so later passes have to do some extra work to clean up
> * There are bugs, e.g. [2] which would require a rewrite of PRE to be
> fixed.
>
> ### NewGVN
>
> The new algorithm implements the ideas described in [3] with some
> engineering optimizations by Dan (for example the set of touched
> instructions is represented using a Bitvector instead of a set because
> it's not uncommon for large functions where a predicate change that
> thousands of instructions need to be changed, and we both measured 30%
> of the whole pass time spent just modifying the set).
> The code pretty much maps what the paper describe so I won't try to
> repeat it here =)
>
> Some advantages of NewGVN:
> * GVN performs a real analysis (which is now part of the pass itself
> but potentially could be split and used as an utility by other
> passes). For example, Quentin/Dan pointed out that outlining at the IR
> level is hard without a proper value numbering analysis.
> * On large testcases, It's faster than current GVN, even though
didn't
> go through a lot of profiling/optimization lately (I found it up to
> 50% faster in compile time on some internal benchmarks). There are
> some places were we can improve. For example, we spend a lot of time
> inside Simplify* functions. Another considerable chunk of the time is
> spent inside MemorySSA, but this is going to change once we preserve
> MemSSA more aggressively.
> * The code is easier to understand (at least to me)
> [...]
>
> Some current limitations of NewGVN:
> * NewGVN doesn't do everything that the current GVN does. There are
> plans to implement missing features and more.
> * On small testcases NewGVN may be slower as it does some work upfront
> that the current GVN doesn't do. NewGVN is probably less lazy than it
> should be, but it didn't matter for us so we didn't consider it a
> priority. If somebody cares and finds a case where NewGVN
> substantially regresses, speak up.
>
> The initial code can be found here https://reviews.llvm.org/D26224
> The current patch includes only the "core" GVN algorithm, i.e.
the
> expression framework/the logic to perform congruence finding. Other
> pieces (e.g. PRE will build on top of that).
>
> My rough plan is:
> * Try to get this reviewed/tested and in tree.
> * Fix miscompiles/bugs (and performance problems as/if they arise).
> * Build other pieces of the algorithm on top of what we have. My
> immediate concern will be implementing support for llvm.assume and
> load coercion. Dan said he will try to find the time to work on the
> predicate handling (actually, there's already a branch
> https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
>
> Please let us know what you think. Any feedback/review comment/testing
> is very appreciated!
>
> Thanks!
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
> [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
> [3] http://dl.acm.org/citation.cfm?id=512536
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161115/276e16ae/attachment.html>

Davide Italiano via llvm-dev

2016-Nov-16 07:32 UTC

head link

[llvm-dev] [RFC] NewGVN

On Tue, Nov 15, 2016 at 11:29 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:> First, thanks. This is a very very long time coming :)
> Second, for those watching, note that pretty much all of the improvements
> and missing cases, including load forwarding/coercion, etc, actually are
> done.
> It's more a matter of cleaning them up, breaking it down, and
submitting it,
> than "implementing them".
Oh sure, bad wording. For those interested in the other pieces, this
is Dan's branch
https://github.com/dberlin/llvm-gvn-rewrite
> The main experimentation at this point is "can we do it cleaner"
not "can we
> do it" :)
>
> It's also important to note that this new GVN also treats loads/stores
in a
> unified way with scalars, unlike current GVN (which has no load or store
> value numbering).
>
> So it will happily discover complex load/store relations (though there is
> some improvements we can still make here)
> For example:
>
> int vnum_test8(int *data)
> {
>   int i;
>   int stop = data[3];
>   int m = data[4];
>   int n = m;
>   int p;
>   for (i=0; i<stop; i++) {
>     int k = data[2];
>     data[k] = 2;
>     data[0] = m - n;
>     k = data[1];
>     m = m + k;
>     n = n + k;
>     p = data[0];
>   }
>   return p;
> }
>
> LLVM's current GVN will eliminate a single load here[1]
> NewGVN will calculate that m and n are equal, that m-n is 0, that p is 0
>
> It's not quite perfect yet, i haven't fixed store handling, so the
following
> is missed:
> int a;
> int *p;
> // LLVM is too smart and if we don't do this, realizes *p is a store to
> undef
> void foo(){
>     p = &a;
> }
> int main(int argc, char **argv) {
>   int result;
>   foo();
>   *p = 2;
>   if (argc)
>     *p = 2;
>   result = *p;
>   return result;
> }
>
> Here, current LLVM GVN will do nothing, because it can't understand
anything
> really about the stores.
> GCC's GVN will determine result is 2.
> NewGVN is not quite that smart yet (it require a little work to what we do
> to stores, and value numbering memory ssa versions)
>
> This issue compounds if you have conditional stores of the same value.
>
> So, for example, if you add:
>
> if (i < 30)
>   data[0] = 0;
>
> to the first case.
>
> GCC can still determine p is 0.
>
> Currently, NewGVN cannot.
>
> --Dan
>
> On Tue, Nov 15, 2016 at 10:49 PM, Davide Italiano <davide at
freebsd.org>
> wrote:
>>
>> Hi,
>> we would like to propose  a new Global Value Numbering pass in LLVM.
>> The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
>> into submittable patches from me). The code has been around for a
>> while (2012 or before), and we think it's getting ready to be
>> committed upstream.
>>
>> ### Motivation
>>
>> To put things into context: my personal motivation for having a new
>> GVN/PRE algorithm is LTO.
>> It's not a secret that LLVM is getting slower and slower release
after
>> release, as Rafael discovered/pointed out in March [1] (and probably
>> many others found out). I personally took a shot at profiling LTO on
>> many internal/opensource applications (including clang itself) and
>> noticed that GVN always show in the top-3 passes (and it's
generally
>> the pass where we spend most of the time in the middle-end). There are
>> cases (extreme) where 90% of the compile time goes in GVN.
>>
>> Example:
>>
>>
===-------------------------------------------------------------------------==>>
... Pass execution timing report ...
>>
>>
===-------------------------------------------------------------------------==>>
Total Execution Time: 684.7622 seconds (683.7141 wall clock)
>>
>>    ---User Time---   --System Time--   --User+System--   ---Wall
>> Time---  --- Name ---
>>   130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
>> 20.1%)  X86 DAG->DAG Instruction Selection
>>   55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
>> 8.2%)  Function Integration/Inlining
>>   42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
>> 6.3%)  Global Value Numbering
>>
>> ### Problems in the current GVN
>>
>> There are some issues in the current GVN infrastructure. A
>> non-exhaustive list (I'm pretty sure people have their list of
>> complains, these are the ones I care about the most).
>> * GVN is very slow for large test cases
>> * GVN doesn't do a real analysis, instead it eliminates as it goes,
so
>> it's hard to reason about it.
>> * GVN doesn't perform any phi predication, i.e. doesn't know
about phi
>> nodes, so later passes have to do some extra work to clean up
>> * There are bugs, e.g. [2] which would require a rewrite of PRE to be
>> fixed.
>>
>> ### NewGVN
>>
>> The new algorithm implements the ideas described in [3] with some
>> engineering optimizations by Dan (for example the set of touched
>> instructions is represented using a Bitvector instead of a set because
>> it's not uncommon for large functions where a predicate change that
>> thousands of instructions need to be changed, and we both measured 30%
>> of the whole pass time spent just modifying the set).
>> The code pretty much maps what the paper describe so I won't try to
>> repeat it here =)
>>
>> Some advantages of NewGVN:
>> * GVN performs a real analysis (which is now part of the pass itself
>> but potentially could be split and used as an utility by other
>> passes). For example, Quentin/Dan pointed out that outlining at the IR
>> level is hard without a proper value numbering analysis.
>> * On large testcases, It's faster than current GVN, even though
didn't
>> go through a lot of profiling/optimization lately (I found it up to
>> 50% faster in compile time on some internal benchmarks). There are
>> some places were we can improve. For example, we spend a lot of time
>> inside Simplify* functions. Another considerable chunk of the time is
>> spent inside MemorySSA, but this is going to change once we preserve
>> MemSSA more aggressively.
>> * The code is easier to understand (at least to me)
>> [...]
>>
>> Some current limitations of NewGVN:
>> * NewGVN doesn't do everything that the current GVN does. There are
>> plans to implement missing features and more.
>> * On small testcases NewGVN may be slower as it does some work upfront
>> that the current GVN doesn't do. NewGVN is probably less lazy than
it
>> should be, but it didn't matter for us so we didn't consider it
a
>> priority. If somebody cares and finds a case where NewGVN
>> substantially regresses, speak up.
>>
>> The initial code can be found here https://reviews.llvm.org/D26224
>> The current patch includes only the "core" GVN algorithm,
i.e. the
>> expression framework/the logic to perform congruence finding. Other
>> pieces (e.g. PRE will build on top of that).
>>
>> My rough plan is:
>> * Try to get this reviewed/tested and in tree.
>> * Fix miscompiles/bugs (and performance problems as/if they arise).
>> * Build other pieces of the algorithm on top of what we have. My
>> immediate concern will be implementing support for llvm.assume and
>> load coercion. Dan said he will try to find the time to work on the
>> predicate handling (actually, there's already a branch
>> https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
>>
>> Please let us know what you think. Any feedback/review comment/testing
>> is very appreciated!
>>
>> Thanks!
>>
>> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
>> [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
>> [3] http://dl.acm.org/citation.cfm?id=512536
>>
>> --
>> Davide
>>
>> "There are no solved problems; there are only problems that are
more
>> or less solved" -- Henri Poincare
>
>


-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

David Chisnall via llvm-dev

2016-Nov-16 10:03 UTC

head link

[llvm-dev] [RFC] NewGVN

This is really great to see, as I’ve spent far too much of my life over the past
two years fighting with undocumented assumptions made by GVN.  A couple of quick
questions about the new GVN, based on problems I’ve had with the old one:

Does it assume that it’s always safe to widen a load (or store) to a power of
two?  For our target, this is only sound if you can show that the pointer was
used to read all of the bytes that you are loading (we have byte-granularity
memory safety).  Old GVN has no hooks for targets to specify whether this is
safe and so is implicitly assuming a page-based MMU.  This optimisation is also
unsound for M-profile ARM cores, though will fail occasionally there, whereas it
fails deterministically for us.

Does it make any assumptions about the layout of memory in pointers?  Old GVN
treats pointers as integers and assumes that it’s safe to do partial stores to
them.  As a prerequisite for memory safety, we must be able to guarantee atomic
updates to pointers and we had to hack GVN to disable a bunch of these things. 
In LLVM IR, pointers are opaque and there is no guarantee that their
representation is the same as a same-sized integer, but old GVN makes this
assumption.

David
> On 16 Nov 2016, at 06:49, Davide Italiano via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> we would like to propose  a new Global Value Numbering pass in LLVM.
> The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
> into submittable patches from me). The code has been around for a
> while (2012 or before), and we think it's getting ready to be
> committed upstream.
> 
> ### Motivation
> 
> To put things into context: my personal motivation for having a new
> GVN/PRE algorithm is LTO.
> It's not a secret that LLVM is getting slower and slower release after
> release, as Rafael discovered/pointed out in March [1] (and probably
> many others found out). I personally took a shot at profiling LTO on
> many internal/opensource applications (including clang itself) and
> noticed that GVN always show in the top-3 passes (and it's generally
> the pass where we spend most of the time in the middle-end). There are
> cases (extreme) where 90% of the compile time goes in GVN.
> 
> Example:
>
===-------------------------------------------------------------------------==>
... Pass execution timing report ...
>
===-------------------------------------------------------------------------==>
Total Execution Time: 684.7622 seconds (683.7141 wall clock)
> 
>   ---User Time---   --System Time--   --User+System--   ---Wall
> Time---  --- Name ---
>  130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
> 20.1%)  X86 DAG->DAG Instruction Selection
>  55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
> 8.2%)  Function Integration/Inlining
>  42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
> 6.3%)  Global Value Numbering
> 
> ### Problems in the current GVN
> 
> There are some issues in the current GVN infrastructure. A
> non-exhaustive list (I'm pretty sure people have their list of
> complains, these are the ones I care about the most).
> * GVN is very slow for large test cases
> * GVN doesn't do a real analysis, instead it eliminates as it goes, so
> it's hard to reason about it.
> * GVN doesn't perform any phi predication, i.e. doesn't know about
phi
> nodes, so later passes have to do some extra work to clean up
> * There are bugs, e.g. [2] which would require a rewrite of PRE to be
fixed.
> 
> ### NewGVN
> 
> The new algorithm implements the ideas described in [3] with some
> engineering optimizations by Dan (for example the set of touched
> instructions is represented using a Bitvector instead of a set because
> it's not uncommon for large functions where a predicate change that
> thousands of instructions need to be changed, and we both measured 30%
> of the whole pass time spent just modifying the set).
> The code pretty much maps what the paper describe so I won't try to
> repeat it here =)
> 
> Some advantages of NewGVN:
> * GVN performs a real analysis (which is now part of the pass itself
> but potentially could be split and used as an utility by other
> passes). For example, Quentin/Dan pointed out that outlining at the IR
> level is hard without a proper value numbering analysis.
> * On large testcases, It's faster than current GVN, even though
didn't
> go through a lot of profiling/optimization lately (I found it up to
> 50% faster in compile time on some internal benchmarks). There are
> some places were we can improve. For example, we spend a lot of time
> inside Simplify* functions. Another considerable chunk of the time is
> spent inside MemorySSA, but this is going to change once we preserve
> MemSSA more aggressively.
> * The code is easier to understand (at least to me)
> [...]
> 
> Some current limitations of NewGVN:
> * NewGVN doesn't do everything that the current GVN does. There are
> plans to implement missing features and more.
> * On small testcases NewGVN may be slower as it does some work upfront
> that the current GVN doesn't do. NewGVN is probably less lazy than it
> should be, but it didn't matter for us so we didn't consider it a
> priority. If somebody cares and finds a case where NewGVN
> substantially regresses, speak up.
> 
> The initial code can be found here https://reviews.llvm.org/D26224
> The current patch includes only the "core" GVN algorithm, i.e.
the
> expression framework/the logic to perform congruence finding. Other
> pieces (e.g. PRE will build on top of that).
> 
> My rough plan is:
> * Try to get this reviewed/tested and in tree.
> * Fix miscompiles/bugs (and performance problems as/if they arise).
> * Build other pieces of the algorithm on top of what we have. My
> immediate concern will be implementing support for llvm.assume and
> load coercion. Dan said he will try to find the time to work on the
> predicate handling (actually, there's already a branch
> https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
> 
> Please let us know what you think. Any feedback/review comment/testing
> is very appreciated!
> 
> Thanks!
> 
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
> [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
> [3] http://dl.acm.org/citation.cfm?id=512536
> 
> -- 
> Davide
> 
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Daniel Berlin via llvm-dev

2016-Nov-16 16:01 UTC

head link

[llvm-dev] [RFC] NewGVN

On Wed, Nov 16, 2016 at 2:03 AM, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> This is really great to see, as I’ve spent far too much of my life over
> the past two years fighting with undocumented assumptions made by GVN.  A
> couple of quick questions about the new GVN, based on problems I’ve had
> with the old one:
>
> Does it assume that it’s always safe to widen a load (or store) to a power
> of two?

I don't believe old gvn does widening any more, and new gvn certainly
doesn't.

> For our target, this is only sound if you can show that the pointer was
> used to read all of the bytes that you are loading (we have
> byte-granularity memory safety).  Old GVN has no hooks for targets to
> specify whether this is safe and so is implicitly assuming a page-based
> MMU.  This optimisation is also unsound for M-profile ARM cores, though
> will fail occasionally there, whereas it fails deterministically for us.
>
> Does it make any assumptions about the layout of memory in pointers?
Not that i know of

> Old GVN treats pointers as integers and assumes that it’s safe to do
> partial stores to them.
Can you give an example?
Is this store coercion or something?

>   As a prerequisite for memory safety, we must be able to guarantee atomic
> updates to pointers and we had to hack GVN to disable a bunch of these
> things.  In LLVM IR, pointers are opaque and there is no guarantee that
> their representation is the same as a same-sized integer, but old GVN makes
> this assumption.
>
> David
>
> > On 16 Nov 2016, at 06:49, Davide Italiano via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> > we would like to propose  a new Global Value Numbering pass in LLVM.
> > The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
> > into submittable patches from me). The code has been around for a
> > while (2012 or before), and we think it's getting ready to be
> > committed upstream.
> >
> > ### Motivation
> >
> > To put things into context: my personal motivation for having a new
> > GVN/PRE algorithm is LTO.
> > It's not a secret that LLVM is getting slower and slower release
after
> > release, as Rafael discovered/pointed out in March [1] (and probably
> > many others found out). I personally took a shot at profiling LTO on
> > many internal/opensource applications (including clang itself) and
> > noticed that GVN always show in the top-3 passes (and it's
generally
> > the pass where we spend most of the time in the middle-end). There are
> > cases (extreme) where 90% of the compile time goes in GVN.
> >
> > Example:
> > ===---------------------------------------------------------
> ----------------==> >                      ... Pass execution timing
report ...
> > ===---------------------------------------------------------
> ----------------==> >  Total Execution Time: 684.7622 seconds
(683.7141 wall clock)
> >
> >   ---User Time---   --System Time--   --User+System--   ---Wall
> > Time---  --- Name ---
> >  130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
> > 20.1%)  X86 DAG->DAG Instruction Selection
> >  55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
> > 8.2%)  Function Integration/Inlining
> >  42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
> > 6.3%)  Global Value Numbering
> >
> > ### Problems in the current GVN
> >
> > There are some issues in the current GVN infrastructure. A
> > non-exhaustive list (I'm pretty sure people have their list of
> > complains, these are the ones I care about the most).
> > * GVN is very slow for large test cases
> > * GVN doesn't do a real analysis, instead it eliminates as it
goes, so
> > it's hard to reason about it.
> > * GVN doesn't perform any phi predication, i.e. doesn't know
about phi
> > nodes, so later passes have to do some extra work to clean up
> > * There are bugs, e.g. [2] which would require a rewrite of PRE to be
> fixed.
> >
> > ### NewGVN
> >
> > The new algorithm implements the ideas described in [3] with some
> > engineering optimizations by Dan (for example the set of touched
> > instructions is represented using a Bitvector instead of a set because
> > it's not uncommon for large functions where a predicate change
that
> > thousands of instructions need to be changed, and we both measured 30%
> > of the whole pass time spent just modifying the set).
> > The code pretty much maps what the paper describe so I won't try
to
> > repeat it here =)
> >
> > Some advantages of NewGVN:
> > * GVN performs a real analysis (which is now part of the pass itself
> > but potentially could be split and used as an utility by other
> > passes). For example, Quentin/Dan pointed out that outlining at the IR
> > level is hard without a proper value numbering analysis.
> > * On large testcases, It's faster than current GVN, even though
didn't
> > go through a lot of profiling/optimization lately (I found it up to
> > 50% faster in compile time on some internal benchmarks). There are
> > some places were we can improve. For example, we spend a lot of time
> > inside Simplify* functions. Another considerable chunk of the time is
> > spent inside MemorySSA, but this is going to change once we preserve
> > MemSSA more aggressively.
> > * The code is easier to understand (at least to me)
> > [...]
> >
> > Some current limitations of NewGVN:
> > * NewGVN doesn't do everything that the current GVN does. There
are
> > plans to implement missing features and more.
> > * On small testcases NewGVN may be slower as it does some work upfront
> > that the current GVN doesn't do. NewGVN is probably less lazy than
it
> > should be, but it didn't matter for us so we didn't consider
it a
> > priority. If somebody cares and finds a case where NewGVN
> > substantially regresses, speak up.
> >
> > The initial code can be found here https://reviews.llvm.org/D26224
> > The current patch includes only the "core" GVN algorithm,
i.e. the
> > expression framework/the logic to perform congruence finding. Other
> > pieces (e.g. PRE will build on top of that).
> >
> > My rough plan is:
> > * Try to get this reviewed/tested and in tree.
> > * Fix miscompiles/bugs (and performance problems as/if they arise).
> > * Build other pieces of the algorithm on top of what we have. My
> > immediate concern will be implementing support for llvm.assume and
> > load coercion. Dan said he will try to find the time to work on the
> > predicate handling (actually, there's already a branch
> > https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
> >
> > Please let us know what you think. Any feedback/review comment/testing
> > is very appreciated!
> >
> > Thanks!
> >
> > [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
> > [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
> > [3] http://dl.acm.org/citation.cfm?id=512536
> >
> > --
> > Davide
> >
> > "There are no solved problems; there are only problems that are
more
> > or less solved" -- Henri Poincare
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/cfa9b7cd/attachment.html>

Philip Reames via llvm-dev

2016-Nov-18 16:43 UTC

head link

[llvm-dev] [RFC] NewGVN

Glad to see this landing!  It's been a long time coming.

Once this is in, please do not turn it on by default immediately.  Let's 
call for volunteers to find some of the most egregious miscompiles, fix 
them, and then turn this on by default.

Philip


On 11/15/2016 10:49 PM, Davide Italiano via llvm-dev
wrote:> Hi,
> we would like to propose  a new Global Value Numbering pass in LLVM.
> The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
> into submittable patches from me). The code has been around for a
> while (2012 or before), and we think it's getting ready to be
> committed upstream.
>
> ### Motivation
>
> To put things into context: my personal motivation for having a new
> GVN/PRE algorithm is LTO.
> It's not a secret that LLVM is getting slower and slower release after
> release, as Rafael discovered/pointed out in March [1] (and probably
> many others found out). I personally took a shot at profiling LTO on
> many internal/opensource applications (including clang itself) and
> noticed that GVN always show in the top-3 passes (and it's generally
> the pass where we spend most of the time in the middle-end). There are
> cases (extreme) where 90% of the compile time goes in GVN.
>
> Example:
>
===-------------------------------------------------------------------------==>
... Pass execution timing report ...
>
===-------------------------------------------------------------------------==>
Total Execution Time: 684.7622 seconds (683.7141 wall clock)
>
>     ---User Time---   --System Time--   --User+System--   ---Wall
> Time---  --- Name ---
>    130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
> 20.1%)  X86 DAG->DAG Instruction Selection
>    55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
> 8.2%)  Function Integration/Inlining
>    42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
> 6.3%)  Global Value Numbering
>
> ### Problems in the current GVN
>
> There are some issues in the current GVN infrastructure. A
> non-exhaustive list (I'm pretty sure people have their list of
> complains, these are the ones I care about the most).
> * GVN is very slow for large test cases
> * GVN doesn't do a real analysis, instead it eliminates as it goes, so
> it's hard to reason about it.
> * GVN doesn't perform any phi predication, i.e. doesn't know about
phi
> nodes, so later passes have to do some extra work to clean up
> * There are bugs, e.g. [2] which would require a rewrite of PRE to be
fixed.
>
> ### NewGVN
>
> The new algorithm implements the ideas described in [3] with some
> engineering optimizations by Dan (for example the set of touched
> instructions is represented using a Bitvector instead of a set because
> it's not uncommon for large functions where a predicate change that
> thousands of instructions need to be changed, and we both measured 30%
> of the whole pass time spent just modifying the set).
> The code pretty much maps what the paper describe so I won't try to
> repeat it here =)
>
> Some advantages of NewGVN:
> * GVN performs a real analysis (which is now part of the pass itself
> but potentially could be split and used as an utility by other
> passes). For example, Quentin/Dan pointed out that outlining at the IR
> level is hard without a proper value numbering analysis.
> * On large testcases, It's faster than current GVN, even though
didn't
> go through a lot of profiling/optimization lately (I found it up to
> 50% faster in compile time on some internal benchmarks). There are
> some places were we can improve. For example, we spend a lot of time
> inside Simplify* functions. Another considerable chunk of the time is
> spent inside MemorySSA, but this is going to change once we preserve
> MemSSA more aggressively.
> * The code is easier to understand (at least to me)
> [...]
>
> Some current limitations of NewGVN:
> * NewGVN doesn't do everything that the current GVN does. There are
> plans to implement missing features and more.
> * On small testcases NewGVN may be slower as it does some work upfront
> that the current GVN doesn't do. NewGVN is probably less lazy than it
> should be, but it didn't matter for us so we didn't consider it a
> priority. If somebody cares and finds a case where NewGVN
> substantially regresses, speak up.
>
> The initial code can be found here https://reviews.llvm.org/D26224
> The current patch includes only the "core" GVN algorithm, i.e.
the
> expression framework/the logic to perform congruence finding. Other
> pieces (e.g. PRE will build on top of that).
>
> My rough plan is:
> * Try to get this reviewed/tested and in tree.
> * Fix miscompiles/bugs (and performance problems as/if they arise).
> * Build other pieces of the algorithm on top of what we have. My
> immediate concern will be implementing support for llvm.assume and
> load coercion. Dan said he will try to find the time to work on the
> predicate handling (actually, there's already a branch
> https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
>
> Please let us know what you think. Any feedback/review comment/testing
> is very appreciated!
>
> Thanks!
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
> [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
> [3] http://dl.acm.org/citation.cfm?id=512536
>

Daniel Berlin via llvm-dev

2016-Nov-18 17:05 UTC

head link

[llvm-dev] [RFC] NewGVN

Agreed.
Davide and I agreed that the right play is to cut it down to the core part,
and submit that, and then add all the stuff on top.
It would be really nice to get the core part of it well tested so that when
we add the stuff like load coercion, etc, we don't also have to debug too
many issues in the core of it.


On Fri, Nov 18, 2016, 8:44 AM Philip Reames via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Glad to see this landing!  It's been a long time coming.
>
> Once this is in, please do not turn it on by default immediately. 
Let's
> call for volunteers to find some of the most egregious miscompiles, fix
> them, and then turn this on by default.
>
> Philip
>
>
> On 11/15/2016 10:49 PM, Davide Italiano via llvm-dev wrote:
> > Hi,
> > we would like to propose  a new Global Value Numbering pass in LLVM.
> > The ideas/code are from Daniel Berlin (with a minor overhaul/splitting
> > into submittable patches from me). The code has been around for a
> > while (2012 or before), and we think it's getting ready to be
> > committed upstream.
> >
> > ### Motivation
> >
> > To put things into context: my personal motivation for having a new
> > GVN/PRE algorithm is LTO.
> > It's not a secret that LLVM is getting slower and slower release
after
> > release, as Rafael discovered/pointed out in March [1] (and probably
> > many others found out). I personally took a shot at profiling LTO on
> > many internal/opensource applications (including clang itself) and
> > noticed that GVN always show in the top-3 passes (and it's
generally
> > the pass where we spend most of the time in the middle-end). There are
> > cases (extreme) where 90% of the compile time goes in GVN.
> >
> > Example:
> > ===---------------------------------------------------------
> ----------------==> >                        ... Pass execution
timing report ...
> > ===---------------------------------------------------------
> ----------------==> >    Total Execution Time: 684.7622 seconds
(683.7141 wall clock)
> >
> >     ---User Time---   --System Time--   --User+System--   ---Wall
> > Time---  --- Name ---
> >    130.2558 ( 20.0%)   6.3128 ( 18.7%)  136.5686 ( 19.9%)  137.6145 (
> > 20.1%)  X86 DAG->DAG Instruction Selection
> >    55.4695 (  8.5%)   0.5501 (  1.6%)  56.0196 (  8.2%)  56.1049 (
> > 8.2%)  Function Integration/Inlining
> >    42.3492 (  6.5%)   0.0364 (  0.1%)  42.3856 (  6.2%)  42.8676 (
> > 6.3%)  Global Value Numbering
> >
> > ### Problems in the current GVN
> >
> > There are some issues in the current GVN infrastructure. A
> > non-exhaustive list (I'm pretty sure people have their list of
> > complains, these are the ones I care about the most).
> > * GVN is very slow for large test cases
> > * GVN doesn't do a real analysis, instead it eliminates as it
goes, so
> > it's hard to reason about it.
> > * GVN doesn't perform any phi predication, i.e. doesn't know
about phi
> > nodes, so later passes have to do some extra work to clean up
> > * There are bugs, e.g. [2] which would require a rewrite of PRE to be
> fixed.
> >
> > ### NewGVN
> >
> > The new algorithm implements the ideas described in [3] with some
> > engineering optimizations by Dan (for example the set of touched
> > instructions is represented using a Bitvector instead of a set because
> > it's not uncommon for large functions where a predicate change
that
> > thousands of instructions need to be changed, and we both measured 30%
> > of the whole pass time spent just modifying the set).
> > The code pretty much maps what the paper describe so I won't try
to
> > repeat it here =)
> >
> > Some advantages of NewGVN:
> > * GVN performs a real analysis (which is now part of the pass itself
> > but potentially could be split and used as an utility by other
> > passes). For example, Quentin/Dan pointed out that outlining at the IR
> > level is hard without a proper value numbering analysis.
> > * On large testcases, It's faster than current GVN, even though
didn't
> > go through a lot of profiling/optimization lately (I found it up to
> > 50% faster in compile time on some internal benchmarks). There are
> > some places were we can improve. For example, we spend a lot of time
> > inside Simplify* functions. Another considerable chunk of the time is
> > spent inside MemorySSA, but this is going to change once we preserve
> > MemSSA more aggressively.
> > * The code is easier to understand (at least to me)
> > [...]
> >
> > Some current limitations of NewGVN:
> > * NewGVN doesn't do everything that the current GVN does. There
are
> > plans to implement missing features and more.
> > * On small testcases NewGVN may be slower as it does some work upfront
> > that the current GVN doesn't do. NewGVN is probably less lazy than
it
> > should be, but it didn't matter for us so we didn't consider
it a
> > priority. If somebody cares and finds a case where NewGVN
> > substantially regresses, speak up.
> >
> > The initial code can be found here https://reviews.llvm.org/D26224
> > The current patch includes only the "core" GVN algorithm,
i.e. the
> > expression framework/the logic to perform congruence finding. Other
> > pieces (e.g. PRE will build on top of that).
> >
> > My rough plan is:
> > * Try to get this reviewed/tested and in tree.
> > * Fix miscompiles/bugs (and performance problems as/if they arise).
> > * Build other pieces of the algorithm on top of what we have. My
> > immediate concern will be implementing support for llvm.assume and
> > load coercion. Dan said he will try to find the time to work on the
> > predicate handling (actually, there's already a branch
> > https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates).
> >
> > Please let us know what you think. Any feedback/review comment/testing
> > is very appreciated!
> >
> > Thanks!
> >
> > [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html
> > [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11
> > [3] http://dl.acm.org/citation.cfm?id=512536
> >
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/ad4c572d/attachment-0001.html>

Davide Italiano via llvm-dev

2016-Nov-18 19:54 UTC

head link

[llvm-dev] [RFC] NewGVN

On Fri, Nov 18, 2016 at 8:43 AM, Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Glad to see this landing!  It's been a long time coming.
>
> Once this is in, please do not turn it on by default immediately. 
Let's
> call for volunteers to find some of the most egregious miscompiles, fix
> them, and then turn this on by default.
>
There are no immediate plans to enable NewGVN by default (at least,
not in the near future). In fact, the mail that I originally wrote
doesn't at all mention the switch, neither any follow-ups from me or
Daniel, so, I'm not entirely sure where you got that idea from. If you
take a look more closely (at the mail, or a the patch), you'll realize
that "key" pieces that are in old GVN are still missing. The most
noticeable are PRE and load coercion. In other words, the patch
proposed is not (yet) on par with what the current GVN does (although
all the missing pieces are already implemented out-of-tree).

Also, let me try to clarify one point. This is already a call for
volunteers. If you feel adventurous, you can download the
patch/apply/test/report issues. I can and I will spend time
integrating the rest of the work and fix all the reported
bugs/miscompiles. If there's something that can we do in a cleaner
way, a discussion will happen on the mailing list/on the review thread
and everybody will have a chance to comment, as it's happening for the
initial patch (and as I always try to do).

Once the first patch lands, I'll commit a temporary cl::opt to enable
NewGVN for those interested in testing and send another CFT e-mail.
FWIW, The patch had already a round of light testing internally. Of
course, this is not enough or indicative of its maturity/robustness. I
plan to have it tested more carefully inside my organization in
parallel.

That said, thanks for you input.

--
Davide

llvm dev - Nov 2016 - [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN

[llvm-dev] [RFC] NewGVN