Siu Kwan Lam via llvm-dev
2016-Dec-26 19:54 UTC
[llvm-dev] Multiple simplifycfg pass make some loop significantly slower
Hi all, I am noticing a significant degradation in execution performance in loops with just one backedge than loops with two backedges. Unifying the backedges into one will also cause the slowdown. To replicate this problem, I used the C code in https://gist.github.com/sklam/11f11a410258ca191e6f263262a4ea65 and checked against clang-3.8 and clang-4.0 nightly. Depending on where I put the "increment" code for a for-loop, I can get 2x performance difference. The slow (but natural) version: for (i=0; i<size; ++i) { ai = arr[i]; if ( ai <= amin ) { amin = ai; all_missing = 0; } } The fast version: for (i=0; i<size;) { ai = arr[i]; ++i; // increment moved here if ( ai <= amin ) { amin = ai; all_missing = 0; } } With the fast version, adding a dummy line after the if-block will make the code slow again: for (i=0; i<size;) { ai = arr[i]; ++i; if ( ai <= amin ) { amin = ai; all_missing = 0; } i; // no effect } At first, I noticed the problem with any opt level >= O1. In an attempt to narrow it down, I found that using `opt -simplifycfg -sroa -simplifycfg` will trigger the slowdown. Removing the second simplifycfg solves it and both versions of the code run fast. Is there a known issue for this? Or, any idea why? Regards, Siu Kwan Lam -- Siu Kwan Lam Software Engineer Continuum Analytics -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161226/1756da55/attachment.html>
Hal Finkel via llvm-dev
2017-Jan-08 22:45 UTC
[llvm-dev] Multiple simplifycfg pass make some loop significantly slower
On 12/26/2016 01:54 PM, Siu Kwan Lam via llvm-dev wrote:> Hi all, > > I am noticing a significant degradation in execution performance in > loops with just one backedge than loops with two backedges. Unifying > the backedges into one will also cause the slowdown. > > To replicate this problem, I used the C code in > https://gist.github.com/sklam/11f11a410258ca191e6f263262a4ea65 and > checked against clang-3.8 and clang-4.0 nightly. Depending on where I > put the "increment" code for a for-loop, I can get 2x performance > difference. > > The slow (but natural) version: > > for (i=0; i<size; ++i) { > ai = arr[i]; > > if ( ai <= amin ) { > amin = ai; > all_missing = 0; > } > } > > The fast version: > > for (i=0; i<size;) { > ai = arr[i]; > ++i; // increment moved here > if ( ai <= amin ) { > amin = ai; > all_missing = 0; > } > } > > With the fast version, adding a dummy line after the if-block will > make the code slow again: > > for (i=0; i<size;) { > ai = arr[i]; > ++i; > if ( ai <= amin ) { > amin = ai; > all_missing = 0; > } > i; // no effect > } > > At first, I noticed the problem with any opt level >= O1. In an > attempt to narrow it down, I found that using `opt -simplifycfg -sroa > -simplifycfg` will trigger the slowdown. Removing the second > simplifycfg solves it and both versions of the code run fast. > > Is there a known issue for this? Or, any idea why?Can you please file a bug report for this (https://llvm.org/bugs/), attaching the IR? I suspect we'll need to look at the generated code. -Hal> > Regards, > Siu Kwan Lam > > -- > Siu Kwan Lam > Software Engineer > Continuum Analytics > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170108/38f675d1/attachment.html>