Tony Jiang via llvm-dev
2017-Aug-31 20:54 UTC
[llvm-dev] [RFC] Value Range Based Optimization Opportunity in LLVM
Hi All, We have recently found some optimization opportunities created by replicating code into branches in order to enable optimization. In general, the optimization opportunity we are pursuing is like the following. Given pseudo-code: // block A if (some condition) // block B // block C If it can be efficiently proven that some portion of block C can be simplified had control flow not gone through the if statement, it might be useful to convert this CFG into a diamond and hoist that portion of block C into both block B and the new block. Consider the following example: int test(int *Ptr, int a, int b, int c, int d) { int Ret = 0; if (__builtin_expect(!Ptr, 0)) { Ret = calli(a); // return Ret / (a|1) / (b|1) / (c|1) / (d|1); // Copy return to here } return Ret / (a|1) / (b|1) / (c|1) / (d|1); // This can be simplified to return 0 } In this case, replicating the return statement in the branch allows the optimizer to prove the value of Ret at the end of the function is 0 and eliminate the arithmetic calculations. A second example: unsigned long funcReturningArbitraryi64(unsigned char *p); #define LENGTH(uv) ( (uv) < 0x80 ? 1 : \ (uv) < 0x800 ? 2 : \ (uv) < 0x10000 ? 3 : \ (uv) < 0x200000 ? 4 : \ (uv) < 0x4000000 ? 5 : \ (uv) < 0x80000000 ? 6 : 7 ) int func(unsigned char *p, bool flag) { unsigned long c = *p; int len; // ... #ifdef _ORIG if(flag) { // ... c = funcReturningArbitraryi64(p); } len = LENGTH(c); #else if(flag) { // ... c = funcReturningArbitraryi64(p); len = LENGTH(c); } else { len = LENGTH(c); } #endif // ... return len; } In this case, we see that creating an else block and replicating the return statement in both the if and else branch (like the code snippet guarded by the #else) enables the macro UNISKIP in the else branch to be optimized. Most of the examples we have come up with so far are centered around value ranges along the conditional branches. When the range of values a symbol can have along different branches is provably different, opportunities for optimization may arise. However, this likely isn't the only category of optimizations that could benefit from this. Is there an existing transformation in LLVM that should be doing this already that is missing this opportunity? If not, we would like to pursue adding this. Of course, this optimization would be subject to a cost model as it may result in code growth. For example, it may not be advantageous to do this if the simplified branch is cold. If anyone has any comments/suggestions we are very much interested in hearing them. Regards, Tony Jiang, M.Sc. LLVM PPC Backend Development IBM Toronto Lab, C2/712/8200/MKM 8200 Warden Ave, Markham, L6G 1C7 Email: jtony at ca.ibm.com Phone: 905-413-3676 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170831/3fef8743/attachment.html>
Hal Finkel via llvm-dev
2017-Sep-01 01:22 UTC
[llvm-dev] [RFC] Value Range Based Optimization Opportunity in LLVM
On 08/31/2017 03:54 PM, Tony Jiang via llvm-dev wrote:> Hi All, > > We have recently found some optimization opportunities created by > replicating code into branches in order to enable optimization. In > general, the optimization opportunity we are pursuing is like the > following. > > Given pseudo-code: > > // block A > if (some condition) > // block B > // block C > > If it can be efficiently proven that some portion of block C can be > simplified had control flow not gone through the if statement, it > might be useful to convert this CFG into a diamond and hoist that > portion of block C into both block B and the new block. > > > Consider the following example: > > > > int test(int *Ptr, int a, int b, int c, int d) { > int Ret = 0; > if (__builtin_expect(!Ptr, 0)) { > Ret = calli(a); > // return Ret / (a|1) / (b|1) / (c|1) / (d|1); // Copy return to here > } > return Ret / (a|1) / (b|1) / (c|1) / (d|1); // This can be > simplified to return 0 > } > > In this case, replicating the return statement in the branch allows > the optimizer to prove the value of Ret at the end of the function is > 0 and eliminate the arithmetic calculations. > > A second example: > > unsigned long funcReturningArbitraryi64(unsigned char *p); > #define LENGTH(uv) ( (uv) < 0x80 ? 1 : \ > (uv) < 0x800 ? 2 : \ > (uv) < 0x10000 ? 3 : \ > (uv) < 0x200000 ? 4 : \ > (uv) < 0x4000000 ? 5 : \ > (uv) < 0x80000000 ? 6 : 7 ) > > int func(unsigned char *p, bool flag) > { > unsigned long c = *p; > int len; > // ... > #ifdef _ORIG > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > } > len = LENGTH(c); > #else > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > len = LENGTH(c); > } else { > len = LENGTH(c); > } > #endif > > // ... > > return len; > } > > In this case, we see that creating an else block and replicating the > return statement in both the if and else branch (like the code snippet > guarded by the #else) enables the macro UNISKIP in the else branch to > be optimized. > > > Most of the examples we have come up with so far are centered around > value ranges along the conditional branches. When the range of values > a symbol can have along different branches is provably different, > opportunities for optimization may arise. However, this likely isn't > the only category of optimizations that could benefit from this. > > Is there an existing transformation in LLVM that should be doing this > already that is missing this opportunity? If not, we would like to > pursue adding this. Of course, this optimization would be subject to a > cost model as it may result in code growth. For example, it may not be > advantageous to do this if the simplified branch is cold. If anyone > has any comments/suggestions we are very much interested in hearing them.We have two transformations that track ranges along CFG edges using LazyValueInfo: JumpThreading and CorrelatedValuePropagation. I don't think that either does what you're proposing. -Hal> > > > Regards, > > > Tony Jiang, M.Sc. > LLVM PPC Backend Development > IBM Toronto Lab, C2/712/8200/MKM > 8200 Warden Ave, Markham, L6G 1C7 > Email: _jtony at ca.ibm.com_ <mailto:nemanjai at ca.ibm.com> > Phone: 905-413-3676 > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170831/6d863cd4/attachment-0001.html>
Daniel Berlin via llvm-dev
2017-Sep-01 02:08 UTC
[llvm-dev] [RFC] Value Range Based Optimization Opportunity in LLVM
NewGVN does some of it. It will discover if backpropagation through a phi enables an operation to be put in terms of existing expressions or constants. It does not track ranges though, only explicit values. In your ORIG block, if LENGTH can be expressed as a merge of already-computed expressions or constants in the program, it will discover it. You can stare at examples of this (and compute quite complicated ones if you wish) at test/Transforms/NewGVN/completeness.ll Both your examples already have a phi in it in llvm ir, so that is just a matter of the proper range propagation. It is really: int test() { int Ret0 = 0; if (!Ptr) Ret1= calli(a) RetPhi = PHI(Ret0, Ret1) return RetPhi; } It should already be possible to determine the value of Ret0 to be 0 based on ranges without duplicating anything, and the extra return doesn't give you anything. If you want to do it based on ranges: a combination of a value range computation using the existing sparse propagation engine, and a transform at phi nodes like we use for NewGVN, should be able to handle what you want here. The existing lazy value info may not get an optimal answer for various reasons (it is trying to propagate in the wrong direction, so it loses some context) You could also use the same basic infrastructure that predicateinfo uses, and rename info at points the ranges could have changed to get additional precision. (unlike predicateinfo, which renames when there are branches). There are quite a number of papers on making SSA forms that are effective at range propagation. PredicateInfo is building pruned e-ssa, which is the basis of a number of range analysis (see http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf and the slides at http://laure.gonnord.org/pro/research/ER03_2015/RangeAnalysis.pdf). The only case you should ever need to duplicate code is when the expression is something not already computed in the program. That's not the case for your first example, it can be done without code duplication. For your second example, it would require insertion, but you could know ahead of time which computations you actually need to insert. On Thu, Aug 31, 2017 at 6:22 PM, Hal Finkel via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On 08/31/2017 03:54 PM, Tony Jiang via llvm-dev wrote: > > Hi All, > > We have recently found some optimization opportunities created by > replicating code into branches in order to enable optimization. In general, > the optimization opportunity we are pursuing is like the following. > > Given pseudo-code: > > // block A > if (some condition) > // block B > // block C > > If it can be efficiently proven that some portion of block C can be > simplified had control flow not gone through the if statement, it might be > useful to convert this CFG into a diamond and hoist that portion of block C > into both block B and the new block. > > > Consider the following example: > > > > int test(int *Ptr, int a, int b, int c, int d) { > int Ret = 0; > if (__builtin_expect(!Ptr, 0)) { > Ret = calli(a); > // return Ret / (a|1) / (b|1) / (c|1) / (d|1); // Copy return to here > } > return Ret / (a|1) / (b|1) / (c|1) / (d|1); // This can be simplified to > return 0 > } > > In this case, replicating the return statement in the branch allows the > optimizer to prove the value of Ret at the end of the function is 0 and > eliminate the arithmetic calculations. > > A second example: > > unsigned long funcReturningArbitraryi64(unsigned char *p); > #define LENGTH(uv) ( (uv) < 0x80 ? 1 : \ > (uv) < 0x800 ? 2 : \ > (uv) < 0x10000 ? 3 : \ > (uv) < 0x200000 ? 4 : \ > (uv) < 0x4000000 ? 5 : \ > (uv) < 0x80000000 ? 6 : 7 ) > > int func(unsigned char *p, bool flag) > { > unsigned long c = *p; > int len; > // ... > #ifdef _ORIG > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > } > len = LENGTH(c); > #else > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > len = LENGTH(c); > } else { > len = LENGTH(c); > } > #endif > > // ... > > return len; > } > > In this case, we see that creating an else block and replicating the > return statement in both the if and else branch (like the code snippet > guarded by the #else) enables the macro UNISKIP in the else branch to be > optimized. > > > Most of the examples we have come up with so far are centered around value > ranges along the conditional branches. When the range of values a symbol > can have along different branches is provably different, opportunities for > optimization may arise. However, this likely isn't the only category of > optimizations that could benefit from this. > > Is there an existing transformation in LLVM that should be doing this > already that is missing this opportunity? If not, we would like to pursue > adding this. Of course, this optimization would be subject to a cost model > as it may result in code growth. For example, it may not be advantageous to > do this if the simplified branch is cold. If anyone has any > comments/suggestions we are very much interested in hearing them. > > > We have two transformations that track ranges along CFG edges using > LazyValueInfo: JumpThreading and CorrelatedValuePropagation. I don't think > that either does what you're proposing. > > -Hal > > > > > Regards, > > > Tony Jiang, M.Sc. > LLVM PPC Backend Development > IBM Toronto Lab, C2/712/8200/MKM > 8200 Warden Ave, Markham, L6G 1C7 > Email: *jtony at ca.ibm.com* <nemanjai at ca.ibm.com> > Phone: 905-413-3676 <(905)%20413-3676> > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170831/e2903968/attachment.html>
Johannes Doerfert via llvm-dev
2017-Sep-09 15:14 UTC
[llvm-dev] [RFC] Value Range Based Optimization Opportunity in LLVM
Hi Tony, I am currently working on a polyhedral value analysis that I first sketched here [0]. It is parametric and supports piece-wise defined values, e.g. the abstract value for Ret in the first example would look something like this: Ret = ( 0 if __builtin_expect(!Ptr, 0) == 0) or (calli(a) if __builtin_expect(!Ptr, 0) != 0) Similarly for the second example the abstract value of len would depend on flag and p for some cases and on flag and funcReturningArbitraryi64 for others. Given such a piece-wise abstract value it is easy to find the conditions under which they are constant or simple (for some definition of simple). You can also evaluate expressions under certain constrains, e.g., in a certain basic block or iteration of a loop. If you want you can even derive IR from the conditions or the value under some condition. I will present the analysis (and a subsequent memory analysis) in the SRC at the US LLVMDev meeting next month. If you are interested we can also look into using it to build the transformation you are describing. Cheers, Johannes [0] http://llvm.org/devmtg/2017-02-04/ // The llvm website was down when I wrote this mail though. On 08/31, Tony Jiang via llvm-dev wrote:> Hi All, > > We have recently found some optimization opportunities created by > replicating code into branches in order to enable optimization. In > general, the optimization opportunity we are pursuing is like the > following. > > Given pseudo-code: > > // block A > if (some condition) > // block B > // block C > > If it can be efficiently proven that some portion of block C can be > simplified had control flow not gone through the if statement, it might be > useful to convert this CFG into a diamond and hoist that portion of block > C into both block B and the new block. > > > Consider the following example: > > > > int test(int *Ptr, int a, int b, int c, int d) { > int Ret = 0; > if (__builtin_expect(!Ptr, 0)) { > Ret = calli(a); > // return Ret / (a|1) / (b|1) / (c|1) / (d|1); // Copy return to here > } > return Ret / (a|1) / (b|1) / (c|1) / (d|1); // This can be simplified to > return 0 > } > > In this case, replicating the return statement in the branch allows the > optimizer to prove the value of Ret at the end of the function is 0 and > eliminate the arithmetic calculations. > > A second example: > > unsigned long funcReturningArbitraryi64(unsigned char *p); > #define LENGTH(uv) ( (uv) < 0x80 ? 1 : \ > (uv) < 0x800 ? 2 : \ > (uv) < 0x10000 ? 3 : \ > (uv) < 0x200000 ? 4 : \ > (uv) < 0x4000000 ? 5 : \ > (uv) < 0x80000000 ? 6 : 7 ) > > int func(unsigned char *p, bool flag) > { > unsigned long c = *p; > int len; > // ... > #ifdef _ORIG > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > } > len = LENGTH(c); > #else > if(flag) { > // ... > c = funcReturningArbitraryi64(p); > len = LENGTH(c); > } else { > len = LENGTH(c); > } > #endif > > // ... > > return len; > } > > In this case, we see that creating an else block and replicating the > return statement in both the if and else branch (like the code snippet > guarded by the #else) enables the macro UNISKIP in the else branch to be > optimized. > > > Most of the examples we have come up with so far are centered around value > ranges along the conditional branches. When the range of values a symbol > can have along different branches is provably different, opportunities for > optimization may arise. However, this likely isn't the only category of > optimizations that could benefit from this. > > Is there an existing transformation in LLVM that should be doing this > already that is missing this opportunity? If not, we would like to pursue > adding this. Of course, this optimization would be subject to a cost model > as it may result in code growth. For example, it may not be advantageous > to do this if the simplified branch is cold. If anyone has any > comments/suggestions we are very much interested in hearing them. > > > > Regards, > > > Tony Jiang, M.Sc. > LLVM PPC Backend Development > IBM Toronto Lab, C2/712/8200/MKM > 8200 Warden Ave, Markham, L6G 1C7 > Email: jtony at ca.ibm.com > Phone: 905-413-3676 >> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland Informatics Campus, Germany Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: Digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170909/8f5d55ec/attachment.sig>