Thanks for your reply. Maybe it wasn't so clear, but the optimization I'm writing is target-dependent and so it's declared inside the target backend and is run after the independent optimizer. So I cannot run my pass just after LoopRotate and before InstCombine. I can still use ScalarEvolution, but the instruction combiner can in some cases simplify the entry guard condition. A small example is the following: ///////////////////////////////////// extern void x(int); int foo_int(int a, int b) { int i; for (i = 0; i < b/2; ++i) { x(i); } return 0; } ///////////////////////////////////// the correspondent optimized IR is: /////////////////////////////////////////////////////////////////////////////// define i32 @foo_int(i32 %a, i32 %b) nounwind uwtable { entry: %div = sdiv i32 %b, 2 %cmp3 = icmp sgt i32 %b, 1 br i1 %cmp3, label %for.body, label %for.end for.body: ; preds = %entry, %for.body %i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ] tail call void @x(i32 %i.04) nounwind %inc = add nsw i32 %i.04, 1 %cmp = icmp slt i32 %inc, %div br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body, %entry ret i32 0 } ///////////////////////////////////////////////////////////////////////////////>From the IR is possible to see that the condition in the entry block is%cmp3 = icmp sgt i32 %b, 1 but starting from the loop IV it should be expected something like %cmp3 = icmp slt i32 0, %div and this is verified looking at the instcombine debug trace: //////////////////////////////////////////////// IC: Visiting: %cmp3 = icmp slt i32 0, %div IC: Old = %cmp3 = icmp sgt i32 %div, 0 New = <badref> = icmp sge i32 %b, 2 IC: ADD: %cmp3 = icmp sge i32 %b, 2 IC: ERASE %0 = icmp sgt i32 %div, 0 IC: ADD: %div = sdiv i32 %b, 2 IC: Visiting: %div = sdiv i32 %b, 2 IC: Visiting: %cmp3 = icmp sge i32 %b, 2 IC: Old = %cmp3 = icmp sge i32 %b, 2 New = <badref> = icmp sgt i32 %b, 1 IC: ADD: %cmp3 = icmp sgt i32 %b, 1 IC: ERASE %0 = icmp sge i32 %b, 2 //////////////////////////////////////////////// The fact that a loop has been rotated it means that it has an entry guard that allows in the case of countable loop to optimize the trip count expression: it's an information that can be lost during optimization and maybe not easy to derive with an analysis. The usage of "IR extra information" would help in this case. Using intrinsics with declared side-effects as information handlers I don't know if it adds constraints to the optimizer. Indeed declaring no side-effects won't preserve the information. How can all this be solved? Thanks again. Best regards, Michele Scandale On 02/07/2013 12:07 AM, Andrew Trick wrote:> > On Feb 4, 2013, at 10:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote: > >> Dear all, >> >> I'm working on a late IR target dependent optimization on loops. A part of this >> optimization requires to derive "by hand" the trip-count expression of a given >> loop. In order to handle correctly these cases I need to check if the loop has >> an entry guard or not. The problem I have is that starting from the information >> I derive during my analysis (initial IV value, last IV value, comparison kind) I >> don't know how to verify if the entry guard is present or not, mainly because >> algebraic simplifications changed the entry guard comparison. Being able to know >> if a given loop has been rotated would be enough for my purposes but I haven't >> found a way to obtain this. >> >> I think that the only way is to track in the IR the information, but I don't >> know how to track it in a safe (the information shouldn't be lost) and >> transparent way (it should not interfere with the optimizer). Thinking to custom >> intrinsics I don't understand how to allow memory optimizations and, at the same >> time, prevent the intrinsic elimination or hoisting from the target loop. >> >> I would appreciate any hint on this topic. > > See ScalarEvolution::isLoopEntryGuardedByCond. It's part of the analysis that we use to determine trip count and benefits from rotated loops. > > LLVM never annotates the IR with a history of transformations. You could potentially annotate the LoopInfo analysis, but it will only be preserved within the same LoopPassManager (you would have to run right after LoopRotate). > > The only robust solution is ScalarEvolution. > > -Andy >
On Feb 7, 2013, at 8:40 AM, Michele Scandale <michele.scandale at gmail.com> wrote:> Thanks for your reply. > > Maybe it wasn't so clear, but the optimization I'm writing is target-dependent > and so it's declared inside the target backend and is run after the independent > optimizer. So I cannot run my pass just after LoopRotate and before InstCombine. > I can still use ScalarEvolution, but the instruction combiner can in some cases > simplify the entry guard condition. > > A small example is the following: > > ///////////////////////////////////// > extern void x(int); > > int foo_int(int a, int b) { > int i; > for (i = 0; i < b/2; ++i) { > x(i); > } > > return 0; > } > ///////////////////////////////////// > > the correspondent optimized IR is: > > /////////////////////////////////////////////////////////////////////////////// > define i32 @foo_int(i32 %a, i32 %b) nounwind uwtable { > entry: > %div = sdiv i32 %b, 2 > %cmp3 = icmp sgt i32 %b, 1 > br i1 %cmp3, label %for.body, label %for.end > > for.body: ; preds = %entry, %for.body > %i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ] > tail call void @x(i32 %i.04) nounwind > %inc = add nsw i32 %i.04, 1 > %cmp = icmp slt i32 %inc, %div > br i1 %cmp, label %for.body, label %for.end > > for.end: ; preds = %for.body, %entry > ret i32 0 > } > /////////////////////////////////////////////////////////////////////////////// > > From the IR is possible to see that the condition in the entry block is > > %cmp3 = icmp sgt i32 %b, 1 > > but starting from the loop IV it should be expected something like > > %cmp3 = icmp slt i32 0, %div > > and this is verified looking at the instcombine debug trace: > //////////////////////////////////////////////// > IC: Visiting: %cmp3 = icmp slt i32 0, %div > IC: Old = %cmp3 = icmp sgt i32 %div, 0 > New = <badref> = icmp sge i32 %b, 2 > IC: ADD: %cmp3 = icmp sge i32 %b, 2 > IC: ERASE %0 = icmp sgt i32 %div, 0 > IC: ADD: %div = sdiv i32 %b, 2 > IC: Visiting: %div = sdiv i32 %b, 2 > IC: Visiting: %cmp3 = icmp sge i32 %b, 2 > IC: Old = %cmp3 = icmp sge i32 %b, 2 > New = <badref> = icmp sgt i32 %b, 1 > IC: ADD: %cmp3 = icmp sgt i32 %b, 1 > IC: ERASE %0 = icmp sge i32 %b, 2 > //////////////////////////////////////////////// > > The fact that a loop has been rotated it means that it has an entry guard that > allows in the case of countable loop to optimize the trip count expression: it's > an information that can be lost during optimization and maybe not easy to derive > with an analysis. The usage of "IR extra information" would help in this case. > Using intrinsics with declared side-effects as information handlers I don't know > if it adds constraints to the optimizer. Indeed declaring no side-effects won't > preserve the information. > > How can all this be solved?Thanks for the details. Please add them to a bug report. InstCombine is certainly interfering with our ability to analyze the loop. I think the problem is that ScalarEvolution cannot reason about signed division. This is a general problem independent of your target. At the moment I'm not sure if we can teach ScalarEvolution to reason about this, or if we can defer certain InstCombine's until after loop passes run, including target-specific loop passes (it's hard for me to say whether this InstCombine really a necessary canonicalization before other passes). A bug report would be appropriate. Absent a fix for the above problem, it is certainly possible for you to add target-specific intrinsics. I can't guarantee that it won't interfere with optimization. It might also be possible to add metadata to the loop exit branch when it is cloned by LoopRotate, indicating that the loop is guarded by the same condition. I don't think we want something like this on trunk though since it is a heavy-handed workaround that could be hard to maintain. -Andy> On 02/07/2013 12:07 AM, Andrew Trick wrote: >> >> On Feb 4, 2013, at 10:48 AM, Michele Scandale <michele.scandale at gmail.com> wrote: >> >>> Dear all, >>> >>> I'm working on a late IR target dependent optimization on loops. A part of this >>> optimization requires to derive "by hand" the trip-count expression of a given >>> loop. In order to handle correctly these cases I need to check if the loop has >>> an entry guard or not. The problem I have is that starting from the information >>> I derive during my analysis (initial IV value, last IV value, comparison kind) I >>> don't know how to verify if the entry guard is present or not, mainly because >>> algebraic simplifications changed the entry guard comparison. Being able to know >>> if a given loop has been rotated would be enough for my purposes but I haven't >>> found a way to obtain this. >>> >>> I think that the only way is to track in the IR the information, but I don't >>> know how to track it in a safe (the information shouldn't be lost) and >>> transparent way (it should not interfere with the optimizer). Thinking to custom >>> intrinsics I don't understand how to allow memory optimizations and, at the same >>> time, prevent the intrinsic elimination or hoisting from the target loop. >>> >>> I would appreciate any hint on this topic. >> >> See ScalarEvolution::isLoopEntryGuardedByCond. It's part of the analysis that we use to determine trip count and benefits from rotated loops. >> >> LLVM never annotates the IR with a history of transformations. You could potentially annotate the LoopInfo analysis, but it will only be preserved within the same LoopPassManager (you would have to run right after LoopRotate). >> >> The only robust solution is ScalarEvolution. >> >> -Andy >>
> Thanks for the details. Please add them to a bug report.I will do this.> InstCombine is certainly interfering with our ability to analyze the loop. I think the problem is that ScalarEvolution cannot reason about signed division. This is a general problem independent of your target. At the moment I'm not sure if we can teach ScalarEvolution to reason about this, or if we can defer certain InstCombine's until after loop passes run, including target-specific loop passes (it's hard for me to say whether this InstCombine really a necessary canonicalization before other passes). A bug report would be appropriate.It sounds to me that the same capabilities of InstCombine should be implemented also in the ScalarEvolution.> Absent a fix for the above problem, it is certainly possible for you to add target-specific intrinsics. I can't guarantee that it won't interfere with optimization. It might also be possible to add metadata to the loop exit branch when it is cloned by LoopRotate, indicating that the loop is guarded by the same condition. I don't think we want something like this on trunk though since it is a heavy-handed workaround that could be hard to maintain.Following also the proposal 'Parallel Loop Metadata' it seems to me that basically there's the same problem. Maybe I've not fully understood the things but, assuming to have the ability to map metadata to BasicBlocks then such informations could be registered directly to the loop header or latch, without custom intrinsics and hopefully without interfering with the optimizer. What do you think about this? Thanks again. Best regards, Michele Scandale