Hi Folks, Let's keep this optimization alive. To summarize: several folks voiced general support, but with questions about why existing optimizations do not already catch this case. Deep dive by Wei Mi showed that the optimization is most likely not a clean-up of LSR sloppiness, but something new. Follow-up by myself confirmed that the redundancy eliminated the LoopExitValues pass exists in the original IR from Clang. As a next step, can we come to consensus on where the LoopExitValues optimization belongs? Regards, -steve
Maybe it can follow the "Delete dead loops" pass which is after Induction Variable Simplification pass, so it will not affect existing exitvalue rewriting optimization in Induction Variable Simplification to find out and delete dead loops? Existing pass pipeline: Loop-Closed SSA Form Pass Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops <=== New LoopExitValues Function TargetTransformInfo Loop Pass Manager Unroll loops Memory Dependence Analysis I have the same worry as Philip and Hal that the new LoopExitValues pass may increase some live range significantly in certain cases because it reuses value cross outerloop iterations. Like the following hypothetical case, the value reuse will create a live range living across loop2, loop3, .... But we can add some simple logic to obviate such case. for (int Outer = 0; Outer < Size; ++Outer) { for (int Inner = 0; Inner < Size; ++Inner) Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; for loop 2. for loop 3. ... } Thanks, Wei. On Mon, Sep 21, 2015 at 10:12 AM, Steve King <steve at metrokings.com> wrote:> Hi Folks, Let's keep this optimization alive. To summarize: several > folks voiced general support, but with questions about why existing > optimizations do not already catch this case. Deep dive by Wei Mi > showed that the optimization is most likely not a clean-up of LSR > sloppiness, but something new. Follow-up by myself confirmed that the > redundancy eliminated the LoopExitValues pass exists in the original > IR from Clang. > > As a next step, can we come to consensus on where the LoopExitValues > optimization belongs? > > Regards, > -steve
On Mon, Sep 21, 2015 at 11:13 AM, Wei Mi <wmi at google.com> wrote:> I have the same worry as Philip and Hal that the new LoopExitValues > pass may increase some live range significantly in certain cases > because it reuses value cross outerloop iterations. Like the following > hypothetical case, the value reuse will create a live range living > across loop2, loop3, .... But we can add some simple logic to obviate > such case. >Thanks Wei. Can you please give your ideas about logic to catch abuse of live ranges?
----- Original Message -----> From: "Wei Mi" <wmi at google.com> > To: "Steve King" <steve at metrokings.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Gerolf Hoflehner" <ghoflehner at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Monday, September 21, 2015 1:13:00 PM > Subject: Re: [llvm-dev] [RFC] New pass: LoopExitValues > > Maybe it can follow the "Delete dead loops" pass which is after > Induction Variable Simplification pass, so it will not affect > existing > exitvalue rewriting optimization in Induction Variable Simplification > to find out and delete dead loops?I think this makes sense.> > Existing pass pipeline: > Loop-Closed SSA Form Pass > Loop Pass Manager > Induction Variable Simplification > Recognize loop idioms > Delete dead loops > <=== New > LoopExitValues > Function TargetTransformInfo > Loop Pass Manager > Unroll loops > Memory Dependence Analysis > > I have the same worry as Philip and Hal that the new LoopExitValuesWhile I am concerned about this, I think we should get the pass in-tree first, and then worry about looking for regressions caused by this kind of live-range extension. The cost modeling here might be somewhat subtle because the spilling cost still might be cheaper than the recomputation cost, and that should be accounted for. As a next step, please post the patch adding the optimization pass for review with some test cases. -Hal> pass may increase some live range significantly in certain cases > because it reuses value cross outerloop iterations. Like the > following > hypothetical case, the value reuse will create a live range living > across loop2, loop3, .... But we can add some simple logic to obviate > such case. > > for (int Outer = 0; Outer < Size; ++Outer) { > for (int Inner = 0; Inner < Size; ++Inner) > Dst[Outer * Size + Inner] = Src[Outer * Size + Inner] * Val; > > for loop 2. > for loop 3. > ... > } > > Thanks, > Wei. > > On Mon, Sep 21, 2015 at 10:12 AM, Steve King <steve at metrokings.com> > wrote: > > Hi Folks, Let's keep this optimization alive. To summarize: > > several > > folks voiced general support, but with questions about why existing > > optimizations do not already catch this case. Deep dive by Wei Mi > > showed that the optimization is most likely not a clean-up of LSR > > sloppiness, but something new. Follow-up by myself confirmed that > > the > > redundancy eliminated the LoopExitValues pass exists in the > > original > > IR from Clang. > > > > As a next step, can we come to consensus on where the > > LoopExitValues > > optimization belongs? > > > > Regards, > > -steve >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
On Mon, Sep 21, 2015 at 11:13 AM, Wei Mi <wmi at google.com> wrote:> Maybe it can follow the "Delete dead loops" pass which is after > Induction Variable Simplification pass, so it will not affect existing > exitvalue rewriting optimization in Induction Variable Simplification > to find out and delete dead loops? > > Existing pass pipeline: > Loop-Closed SSA Form Pass > Loop Pass Manager > Induction Variable Simplification > Recognize loop idioms > Delete dead loops > <=== New > LoopExitValues > Function TargetTransformInfo > Loop Pass Manager > Unroll loops > Memory Dependence Analysis >I tried Wei's suggested location for the LEV pass shown above. Placing the pass here fails to find any SCEV redundancy in the test case because none exists yet. The reason is that the relevant SCEV's are off-by-1. Some nitty-gritty: In Pass N of the outer loop, the inner loop exits with %add = (N * %Size) - 1 To be useful, Pass N+1 of the outer loop needs some inner loop exit value = (N * %Size). This mismatch does not jibe with my previous IR hacking experiment. I tried the hack again and the assembly was not equivalent. Somehow I fooled myself before. In the current patch, LEV runs after LSR and does find redundant SCEVs. Running the pass here would also mop up hypothetical SCEV redundancy from any previous pass, but I understand the goal is to avoid such redundancy in the first place. Should we try the patch in it's current location, namely after LSR? Regards, -steve