On Wed, Nov 16, 2016 at 2:03 AM, David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> This is really great to see, as I’ve spent far too much of my life over > the past two years fighting with undocumented assumptions made by GVN. A > couple of quick questions about the new GVN, based on problems I’ve had > with the old one: > > Does it assume that it’s always safe to widen a load (or store) to a power > of two?I don't believe old gvn does widening any more, and new gvn certainly doesn't.> For our target, this is only sound if you can show that the pointer was > used to read all of the bytes that you are loading (we have > byte-granularity memory safety). Old GVN has no hooks for targets to > specify whether this is safe and so is implicitly assuming a page-based > MMU. This optimisation is also unsound for M-profile ARM cores, though > will fail occasionally there, whereas it fails deterministically for us. > > Does it make any assumptions about the layout of memory in pointers?Not that i know of> Old GVN treats pointers as integers and assumes that it’s safe to do > partial stores to them.Can you give an example? Is this store coercion or something?> As a prerequisite for memory safety, we must be able to guarantee atomic > updates to pointers and we had to hack GVN to disable a bunch of these > things. In LLVM IR, pointers are opaque and there is no guarantee that > their representation is the same as a same-sized integer, but old GVN makes > this assumption. >> David > > > On 16 Nov 2016, at 06:49, Davide Italiano via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Hi, > > we would like to propose a new Global Value Numbering pass in LLVM. > > The ideas/code are from Daniel Berlin (with a minor overhaul/splitting > > into submittable patches from me). The code has been around for a > > while (2012 or before), and we think it's getting ready to be > > committed upstream. > > > > ### Motivation > > > > To put things into context: my personal motivation for having a new > > GVN/PRE algorithm is LTO. > > It's not a secret that LLVM is getting slower and slower release after > > release, as Rafael discovered/pointed out in March [1] (and probably > > many others found out). I personally took a shot at profiling LTO on > > many internal/opensource applications (including clang itself) and > > noticed that GVN always show in the top-3 passes (and it's generally > > the pass where we spend most of the time in the middle-end). There are > > cases (extreme) where 90% of the compile time goes in GVN. > > > > Example: > > ===--------------------------------------------------------- > ----------------==> > ... Pass execution timing report ... > > ===--------------------------------------------------------- > ----------------==> > Total Execution Time: 684.7622 seconds (683.7141 wall clock) > > > > ---User Time--- --System Time-- --User+System-- ---Wall > > Time--- --- Name --- > > 130.2558 ( 20.0%) 6.3128 ( 18.7%) 136.5686 ( 19.9%) 137.6145 ( > > 20.1%) X86 DAG->DAG Instruction Selection > > 55.4695 ( 8.5%) 0.5501 ( 1.6%) 56.0196 ( 8.2%) 56.1049 ( > > 8.2%) Function Integration/Inlining > > 42.3492 ( 6.5%) 0.0364 ( 0.1%) 42.3856 ( 6.2%) 42.8676 ( > > 6.3%) Global Value Numbering > > > > ### Problems in the current GVN > > > > There are some issues in the current GVN infrastructure. A > > non-exhaustive list (I'm pretty sure people have their list of > > complains, these are the ones I care about the most). > > * GVN is very slow for large test cases > > * GVN doesn't do a real analysis, instead it eliminates as it goes, so > > it's hard to reason about it. > > * GVN doesn't perform any phi predication, i.e. doesn't know about phi > > nodes, so later passes have to do some extra work to clean up > > * There are bugs, e.g. [2] which would require a rewrite of PRE to be > fixed. > > > > ### NewGVN > > > > The new algorithm implements the ideas described in [3] with some > > engineering optimizations by Dan (for example the set of touched > > instructions is represented using a Bitvector instead of a set because > > it's not uncommon for large functions where a predicate change that > > thousands of instructions need to be changed, and we both measured 30% > > of the whole pass time spent just modifying the set). > > The code pretty much maps what the paper describe so I won't try to > > repeat it here =) > > > > Some advantages of NewGVN: > > * GVN performs a real analysis (which is now part of the pass itself > > but potentially could be split and used as an utility by other > > passes). For example, Quentin/Dan pointed out that outlining at the IR > > level is hard without a proper value numbering analysis. > > * On large testcases, It's faster than current GVN, even though didn't > > go through a lot of profiling/optimization lately (I found it up to > > 50% faster in compile time on some internal benchmarks). There are > > some places were we can improve. For example, we spend a lot of time > > inside Simplify* functions. Another considerable chunk of the time is > > spent inside MemorySSA, but this is going to change once we preserve > > MemSSA more aggressively. > > * The code is easier to understand (at least to me) > > [...] > > > > Some current limitations of NewGVN: > > * NewGVN doesn't do everything that the current GVN does. There are > > plans to implement missing features and more. > > * On small testcases NewGVN may be slower as it does some work upfront > > that the current GVN doesn't do. NewGVN is probably less lazy than it > > should be, but it didn't matter for us so we didn't consider it a > > priority. If somebody cares and finds a case where NewGVN > > substantially regresses, speak up. > > > > The initial code can be found here https://reviews.llvm.org/D26224 > > The current patch includes only the "core" GVN algorithm, i.e. the > > expression framework/the logic to perform congruence finding. Other > > pieces (e.g. PRE will build on top of that). > > > > My rough plan is: > > * Try to get this reviewed/tested and in tree. > > * Fix miscompiles/bugs (and performance problems as/if they arise). > > * Build other pieces of the algorithm on top of what we have. My > > immediate concern will be implementing support for llvm.assume and > > load coercion. Dan said he will try to find the time to work on the > > predicate handling (actually, there's already a branch > > https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates). > > > > Please let us know what you think. Any feedback/review comment/testing > > is very appreciated! > > > > Thanks! > > > > [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html > > [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11 > > [3] http://dl.acm.org/citation.cfm?id=512536 > > > > -- > > Davide > > > > "There are no solved problems; there are only problems that are more > > or less solved" -- Henri Poincare > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/cfa9b7cd/attachment.html>
On Wed, Nov 16, 2016 at 8:01 AM, Daniel Berlin <dberlin at dberlin.org> wrote:> > > On Wed, Nov 16, 2016 at 2:03 AM, David Chisnall via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> This is really great to see, as I’ve spent far too much of my life over >> the past two years fighting with undocumented assumptions made by GVN. A >> couple of quick questions about the new GVN, based on problems I’ve had with >> the old one: >> >> Does it assume that it’s always safe to widen a load (or store) to a power >> of two? > > > I don't believe old gvn does widening any more, and new gvn certainly > doesn't. >Yes, as it apparently blocks other optimizations. David, see https://reviews.llvm.org/D24096 for details. As an aside, if you're interested in combining loads, you may want to take a look at Michael Spencer's loadcombine pass.>> >> For our target, this is only sound if you can show that the pointer was >> used to read all of the bytes that you are loading (we have byte-granularity >> memory safety). Old GVN has no hooks for targets to specify whether this is >> safe and so is implicitly assuming a page-based MMU. This optimisation is >> also unsound for M-profile ARM cores, though will fail occasionally there, >> whereas it fails deterministically for us. >> >> Does it make any assumptions about the layout of memory in pointers? >Which assumptions are you thinking of?> Not that i know of > >> >> Old GVN treats pointers as integers and assumes that it’s safe to do >> partial stores to them. > > Can you give an example? > Is this store coercion or something? > >> >> As a prerequisite for memory safety, we must be able to guarantee atomic >> updates to pointers and we had to hack GVN to disable a bunch of these >> things. In LLVM IR, pointers are opaque and there is no guarantee that >> their representation is the same as a same-sized integer, but old GVN makes >> this assumption. > > >> >> David >> >> > On 16 Nov 2016, at 06:49, Davide Italiano via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> > >> > Hi, >> > we would like to propose a new Global Value Numbering pass in LLVM. >> > The ideas/code are from Daniel Berlin (with a minor overhaul/splitting >> > into submittable patches from me). The code has been around for a >> > while (2012 or before), and we think it's getting ready to be >> > committed upstream. >> > >> > ### Motivation >> > >> > To put things into context: my personal motivation for having a new >> > GVN/PRE algorithm is LTO. >> > It's not a secret that LLVM is getting slower and slower release after >> > release, as Rafael discovered/pointed out in March [1] (and probably >> > many others found out). I personally took a shot at profiling LTO on >> > many internal/opensource applications (including clang itself) and >> > noticed that GVN always show in the top-3 passes (and it's generally >> > the pass where we spend most of the time in the middle-end). There are >> > cases (extreme) where 90% of the compile time goes in GVN. >> > >> > Example: >> > >> > ===-------------------------------------------------------------------------==>> > ... Pass execution timing report ... >> > >> > ===-------------------------------------------------------------------------==>> > Total Execution Time: 684.7622 seconds (683.7141 wall clock) >> > >> > ---User Time--- --System Time-- --User+System-- ---Wall >> > Time--- --- Name --- >> > 130.2558 ( 20.0%) 6.3128 ( 18.7%) 136.5686 ( 19.9%) 137.6145 ( >> > 20.1%) X86 DAG->DAG Instruction Selection >> > 55.4695 ( 8.5%) 0.5501 ( 1.6%) 56.0196 ( 8.2%) 56.1049 ( >> > 8.2%) Function Integration/Inlining >> > 42.3492 ( 6.5%) 0.0364 ( 0.1%) 42.3856 ( 6.2%) 42.8676 ( >> > 6.3%) Global Value Numbering >> > >> > ### Problems in the current GVN >> > >> > There are some issues in the current GVN infrastructure. A >> > non-exhaustive list (I'm pretty sure people have their list of >> > complains, these are the ones I care about the most). >> > * GVN is very slow for large test cases >> > * GVN doesn't do a real analysis, instead it eliminates as it goes, so >> > it's hard to reason about it. >> > * GVN doesn't perform any phi predication, i.e. doesn't know about phi >> > nodes, so later passes have to do some extra work to clean up >> > * There are bugs, e.g. [2] which would require a rewrite of PRE to be >> > fixed. >> > >> > ### NewGVN >> > >> > The new algorithm implements the ideas described in [3] with some >> > engineering optimizations by Dan (for example the set of touched >> > instructions is represented using a Bitvector instead of a set because >> > it's not uncommon for large functions where a predicate change that >> > thousands of instructions need to be changed, and we both measured 30% >> > of the whole pass time spent just modifying the set). >> > The code pretty much maps what the paper describe so I won't try to >> > repeat it here =) >> > >> > Some advantages of NewGVN: >> > * GVN performs a real analysis (which is now part of the pass itself >> > but potentially could be split and used as an utility by other >> > passes). For example, Quentin/Dan pointed out that outlining at the IR >> > level is hard without a proper value numbering analysis. >> > * On large testcases, It's faster than current GVN, even though didn't >> > go through a lot of profiling/optimization lately (I found it up to >> > 50% faster in compile time on some internal benchmarks). There are >> > some places were we can improve. For example, we spend a lot of time >> > inside Simplify* functions. Another considerable chunk of the time is >> > spent inside MemorySSA, but this is going to change once we preserve >> > MemSSA more aggressively. >> > * The code is easier to understand (at least to me) >> > [...] >> > >> > Some current limitations of NewGVN: >> > * NewGVN doesn't do everything that the current GVN does. There are >> > plans to implement missing features and more. >> > * On small testcases NewGVN may be slower as it does some work upfront >> > that the current GVN doesn't do. NewGVN is probably less lazy than it >> > should be, but it didn't matter for us so we didn't consider it a >> > priority. If somebody cares and finds a case where NewGVN >> > substantially regresses, speak up. >> > >> > The initial code can be found here https://reviews.llvm.org/D26224 >> > The current patch includes only the "core" GVN algorithm, i.e. the >> > expression framework/the logic to perform congruence finding. Other >> > pieces (e.g. PRE will build on top of that). >> > >> > My rough plan is: >> > * Try to get this reviewed/tested and in tree. >> > * Fix miscompiles/bugs (and performance problems as/if they arise). >> > * Build other pieces of the algorithm on top of what we have. My >> > immediate concern will be implementing support for llvm.assume and >> > load coercion. Dan said he will try to find the time to work on the >> > predicate handling (actually, there's already a branch >> > https://github.com/dberlin/llvm-gvn-rewrite/tree/newgvn-predicates). >> > >> > Please let us know what you think. Any feedback/review comment/testing >> > is very appreciated! >> > >> > Thanks! >> > >> > [1] http://lists.llvm.org/pipermail/llvm-dev/2016-March/096488.html >> > [2] https://llvm.org/bugs/show_bug.cgi?id=30692#c11 >> > [3] http://dl.acm.org/citation.cfm?id=512536 >> > >> > -- >> > Davide >> > >> > "There are no solved problems; there are only problems that are more >> > or less solved" -- Henri Poincare >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
On 16 Nov 2016, at 19:03, Davide Italiano <davide at FreeBSD.org> wrote:> >>> >>> For our target, this is only sound if you can show that the pointer was >>> used to read all of the bytes that you are loading (we have byte-granularity >>> memory safety). Old GVN has no hooks for targets to specify whether this is >>> safe and so is implicitly assuming a page-based MMU. This optimisation is >>> also unsound for M-profile ARM cores, though will fail occasionally there, >>> whereas it fails deterministically for us. >>> >>> Does it make any assumptions about the layout of memory in pointers? >> > > Which assumptions are you thinking of?My last merge from upstream was about a year ago (and a new one is long overdue), but there were issues where GVN was assuming that if it did a load of a pointer then a ptrtoint, then a truncation, that it would get the same result as doing a narrower load. This is not the case in any platform where pointers are not simply integers (i.e. where you actually need inttoptr / ptrtoint instead of bitcast). David