Johan Engelen via llvm-dev
2017-Jul-06 15:55 UTC
[llvm-dev] Dataflow analysis regression in 3.7
On Thu, Jul 6, 2017 at 7:00 AM, Davide Italiano <davide at freebsd.org> wrote:> On Wed, Jul 5, 2017 at 3:59 PM, Johan Engelen via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > I just found an optimization regression regarding simple > > dataflow/constprop analysis: > > https://godbolt.org/g/Uz8P7t > > > > This code > > ``` > > int dataflow(int b) { > > int a; > > > > if (b==4) > > a = 3*b; // fully optimized when changed to a = 3; > > else > > a = 5; > > > > if (a == 4) > > return 0; > > else > > return 1; > > } > > ``` > > is no longer optimized to just a "return 1". The regression happened in > LLVM > > 3.6 --> 3.7. > > > > Is this a known regression? I couldn't find it in the bug tracker (but > don't > > really know which keywords to look for...) > > > > Kind regards, > > Johan > > > > This regressed when SimplifyCFG changed the threshold for PHI nodes > folding from 1 to 2. > > (see lib/Transforms/Utils/SimplifyCFG.cpp) > > +PHINodeFoldingThreshold("phi-node-folding-threshold", cl::Hidden, > cl::init(2), > + cl::desc("Control the amount of phi node folding to perform > (default = 2)")); > > This could be re-evaluated, maybe, or some other transformation could > catch this case. >I'm sure that adjusting that threshold has many implications, so I don't want to touch that.> (e.g. a more powerful range solver could realize that a is always in > the range [5;12]). >I seem to have been very "lucky" in finding the non-optimized case. I get fully optimized output when replacing "3*b" with "2*b" or "4*b" or "b+b+b+b" or "3*b-b"... But for "3*b", "b+b+b", and "3*b+6" things go wrong. Is there another optimization pass interfering? regards, Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170706/6358c945/attachment.html>
Davide Italiano via llvm-dev
2017-Jul-07 15:52 UTC
[llvm-dev] Dataflow analysis regression in 3.7
> > I'm sure that adjusting that threshold has many implications, so I don't > want to touch that. >We shouldn't, unless there's a very good reason for. Funny enough, this was fixed just yesterday by Chad's commit https://reviews.llvm.org/D34901 so the function now just gets folded to `ret i32 1` (as you expected).>> >> (e.g. a more powerful range solver could realize that a is always in >> the range [5;12]). >As I pointed out, this could've been caught by a more powerful range analysis (before SimplifyCFG "destroys" the CFG). There's a balance to keep between how aggressive SimplifyCFG gets VS what other passes should/could do. Without the last commit you have, after SimplifyCFG: define i32 @dataflow(i32 %b) local_unnamed_addr #0 { entry: %cmp = icmp eq i32 %b, 4 %mul = mul nsw i32 %b, 3 %phitmp = icmp eq i32 %mul, 4 %a.0 = select i1 %cmp, i1 %phitmp, i1 false %. = select i1 %a.0, i32 0, i32 1 ret i32 %. } but just before: define i32 @dataflow(i32 %b) local_unnamed_addr #0 { entry: %cmp = icmp eq i32 %b, 4 br i1 %cmp, label %if.then, label %if.else if.then: ; preds = %entry %vSSA_sigma = phi i32 [ %b, %entry ] %mul = mul nsw i32 3, %vSSA_sigma br label %if.end if.else: ; preds = %entry br label %if.end if.end: ; preds = %if.else, %if.then %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ] %cmp1 = icmp eq i32 %a.0, 4 br i1 %cmp1, label %if.then2, label %if.else3 if.then2: ; preds = %if.end br label %cleanup if.else3: ; preds = %if.end br label %cleanup cleanup: ; preds = %if.else3, %if.then2 %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ] ret i32 %retval.0 } On the latter, you can find (after transforming to e-SSA) that: Analysis for function: dataflow [4, 4] %vSSA_sigma = phi i32 [ %b, %entry ] [12, 12] %mul = mul nsw i32 3, %vSSA_sigma [5, 12] %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ] [0, 1] %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ] Note that our VRP algorithm (correlated-propagation) currently doesn't catch that, but a more powerful analysis does (and it's faster on large testcases & has an interprocedural version, the code is on github). http://homepages.dcc.ufmg.br/~raphael/papers/CGO13.pdf [and the predecessor paper from Zhendong Su, circa 2004 IIRC, I don't have a link handy sorry :)] [I just put a 10 lines intraprocedural/interprocedural client on top of it that prints the found ranges https://github.com/dcci/range-analysis/commit/f9677dfd032f811ed545977690f8d7aead7cccaa] Thanks, -- Davide
Chad Rosier via llvm-dev
2017-Jul-07 20:47 UTC
[llvm-dev] Dataflow analysis regression in 3.7
David/Johan, I would love to claim victory, but I don't think that D34901 catches this case. However, I got interested and threw this together quickly: https://reviews.llvm.org/D35140. This does catch the below case. If people are interested I can add test cases and submit for formal review. FWIW, it does hit about 1/3 of all of the SPEC benchmarks. I haven't done any performance analysis to see if it matters, however. Chad On 7/7/2017 11:52 AM, Davide Italiano via llvm-dev wrote:>> I'm sure that adjusting that threshold has many implications, so I don't >> want to touch that. >> > We shouldn't, unless there's a very good reason for. > Funny enough, this was fixed just yesterday by Chad's commit > https://reviews.llvm.org/D34901 so the function now just gets folded > to `ret i32 1` (as you expected). > >>> (e.g. a more powerful range solver could realize that a is always in >>> the range [5;12]). > As I pointed out, this could've been caught by a more powerful range > analysis (before SimplifyCFG "destroys" the CFG). There's a balance to > keep between how aggressive SimplifyCFG gets VS what other passes > should/could do. > > Without the last commit you have, after SimplifyCFG: > > define i32 @dataflow(i32 %b) local_unnamed_addr #0 { > entry: > %cmp = icmp eq i32 %b, 4 > %mul = mul nsw i32 %b, 3 > %phitmp = icmp eq i32 %mul, 4 > %a.0 = select i1 %cmp, i1 %phitmp, i1 false > %. = select i1 %a.0, i32 0, i32 1 > ret i32 %. > } > > but just before: > > define i32 @dataflow(i32 %b) local_unnamed_addr #0 { > entry: > %cmp = icmp eq i32 %b, 4 > br i1 %cmp, label %if.then, label %if.else > > if.then: ; preds = %entry > %vSSA_sigma = phi i32 [ %b, %entry ] > %mul = mul nsw i32 3, %vSSA_sigma > br label %if.end > > if.else: ; preds = %entry > br label %if.end > > if.end: ; preds = %if.else, %if.then > %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ] > %cmp1 = icmp eq i32 %a.0, 4 > br i1 %cmp1, label %if.then2, label %if.else3 > > if.then2: ; preds = %if.end > br label %cleanup > > if.else3: ; preds = %if.end > br label %cleanup > > cleanup: ; preds = %if.else3, %if.then2 > %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ] > ret i32 %retval.0 > } > > On the latter, you can find (after transforming to e-SSA) that: > > Analysis for function: dataflow > [4, 4] %vSSA_sigma = phi i32 [ %b, %entry ] > [12, 12] %mul = mul nsw i32 3, %vSSA_sigma > [5, 12] %a.0 = phi i32 [ %mul, %if.then ], [ 5, %if.else ] > [0, 1] %retval.0 = phi i32 [ 0, %if.then2 ], [ 1, %if.else3 ] > > Note that our VRP algorithm (correlated-propagation) currently doesn't > catch that, but a more powerful analysis does (and it's faster on > large testcases & has an interprocedural version, the code is on > github). > http://homepages.dcc.ufmg.br/~raphael/papers/CGO13.pdf [and the predecessor > paper from Zhendong Su, circa 2004 IIRC, I don't have a link handy sorry :)] > > [I just put a 10 lines intraprocedural/interprocedural client on top > of it that prints the found ranges > https://github.com/dcci/range-analysis/commit/f9677dfd032f811ed545977690f8d7aead7cccaa] > > Thanks, > > -- > Davide > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev