This might be hard cases making bad law, but the loop: void f (unsigned short *x, int *l) { int c = *l; int i; for (i = 0; i < c; i++) if (x[i]) x[i]++; } is converted to decrement-and-branch form by LoopStrengthReduce while: void f (unsigned short *x, int *l) { int i; for (i = 0; i < *l; i++) if (x[i]) x[i]++; } isn't. This makes a big difference on z, which has a native decrement-and- branch instruction. The problem seems to be that the strength-reduction pass relies on IndVarSimplify to convert loop exit conditions into a canonical != form: // Equality (== and !=) ICmps are special. We can rewrite (i == N) as // (N - i == 0), and this allows (N - i) to be the expression that we work // with rather than just N or i, so we can consider the register // requirements for both N and i at the same time. Limiting this code to // equality icmps is not a problem because all interesting loops use // equality icmps, thanks to IndVarSimplify. if (ICmpInst *CI = dyn_cast<ICmpInst>(LF.UserInst)) if (CI->isEquality()) { and this is happening for the first loop but not the second. The reason is that the second loop still has two loads of *l by the time IndVarSimplify is run; one for the precondition of the rotated loop and one for the loop exit. This means that IndVarSimplify can't tell that the precondition (skip the loop if *l <= 0) is operating on the same *l as the loop exit, and therefore can't work out that *l is the trip count. The two "*l"s are only fused by GVN, which runs after IndVarSimplify. Moving GVN before IndVarSimplify gives me the code I wanted. So does adding an IndVarSimplify pass before LoopStrengthReduce. Would either of these be OK? I was wondering whether the second made sense anyway, since LoopStrengthReduce is done by CodeGen and is pretty far separated from the usual IndVarSimplify position. I've attached the .ll form of the second loop, in case it helps. Thanks, Richard -------------- next part -------------- A non-text attachment was scrubbed... Name: loop.ll Type: application/octet-stream Size: 2140 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131031/2e0655dd/attachment.obj>
On Oct 31, 2013, at 10:59 AM, Richard Sandiford <rsandifo at linux.vnet.ibm.com> wrote:> This might be hard cases making bad law, but the loop: > > void > f (unsigned short *x, int *l) > { > int c = *l; > int i; > for (i = 0; i < c; i++) > if (x[i]) > x[i]++; > } > > is converted to decrement-and-branch form by LoopStrengthReduce while: > > void > f (unsigned short *x, int *l) > { > int i; > for (i = 0; i < *l; i++) > if (x[i]) > x[i]++; > } > > isn't. This makes a big difference on z, which has a native decrement-and- > branch instruction. > > The problem seems to be that the strength-reduction pass relies on > IndVarSimplify to convert loop exit conditions into a canonical != form: > > // Equality (== and !=) ICmps are special. We can rewrite (i == N) as > // (N - i == 0), and this allows (N - i) to be the expression that we work > // with rather than just N or i, so we can consider the register > // requirements for both N and i at the same time. Limiting this code to > // equality icmps is not a problem because all interesting loops use > // equality icmps, thanks to IndVarSimplify. > if (ICmpInst *CI = dyn_cast<ICmpInst>(LF.UserInst)) > if (CI->isEquality()) { > > and this is happening for the first loop but not the second. The reason > is that the second loop still has two loads of *l by the time IndVarSimplify > is run; one for the precondition of the rotated loop and one for the loop > exit. This means that IndVarSimplify can't tell that the precondition > (skip the loop if *l <= 0) is operating on the same *l as the loop exit, > and therefore can't work out that *l is the trip count. The two "*l"s > are only fused by GVN, which runs after IndVarSimplify. > > Moving GVN before IndVarSimplify gives me the code I wanted. So does adding > an IndVarSimplify pass before LoopStrengthReduce. Would either of these > be OK? I was wondering whether the second made sense anyway, since > LoopStrengthReduce is done by CodeGen and is pretty far separated from > the usual IndVarSimplify position. > > I've attached the .ll form of the second loop, in case it helps.This is a good observation, and a bug in the optimization pipeline. A couple things: - I don’t see any good reason that the LinearFunctionTest replacement being done in IndVars can’t be done late, during LSR. AFAICT, it’s only purpose is enabling LoopStrengthReduce. - I think GVN should run earlier, before most of the loop opts, and I think IndVars should run much later. I filed a bug for this. See PR17809: Reordering IR-level optimization passes. It’s hard to make progress on a major pass pipeline change because it requires a lot of investment in performance analysis. If you could file a bug for your test case and relate it to that bug it would be great. It can’t hurt, and may motivate someone to get around to fixing it sooner. -Andy
Andrew Trick <atrick at apple.com> writes:> - I think GVN should run earlier, before most of the loop opts, and I > think IndVars should run much later. I filed a bug for this. See > PR17809: Reordering IR-level optimization passes. It’s hard to make > progress on a major pass pipeline change because it requires a lot of > investment in performance analysis. If you could file a bug for your > test case and relate it to that bug it would be great. It can’t hurt, > and may motivate someone to get around to fixing it sooner.Thanks for the pointer. I've attached the testcase and a summary of the thread. Like you say, it sounds like major work, but should definitely help with this sort of situtation. Richard