Thanks for your reply.
Maybe it wasn't so clear, but the optimization I'm writing is
target-dependent
and so it's declared inside the target backend and is run after the
independent
optimizer. So I cannot run my pass just after LoopRotate and before InstCombine.
I can still use ScalarEvolution, but the instruction combiner can in some cases
simplify the entry guard condition.
A small example is the following:
/////////////////////////////////////
extern void x(int);
int foo_int(int a, int b) {
int i;
for (i = 0; i < b/2; ++i) {
x(i);
}
return 0;
}
/////////////////////////////////////
the correspondent optimized IR is:
///////////////////////////////////////////////////////////////////////////////
define i32 @foo_int(i32 %a, i32 %b) nounwind uwtable {
entry:
%div = sdiv i32 %b, 2
%cmp3 = icmp sgt i32 %b, 1
br i1 %cmp3, label %for.body, label %for.end
for.body: ; preds = %entry, %for.body
%i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
tail call void @x(i32 %i.04) nounwind
%inc = add nsw i32 %i.04, 1
%cmp = icmp slt i32 %inc, %div
br i1 %cmp, label %for.body, label %for.end
for.end: ; preds = %for.body, %entry
ret i32 0
}
///////////////////////////////////////////////////////////////////////////////
>From the IR is possible to see that the condition in the entry block is
%cmp3 = icmp sgt i32 %b, 1
but starting from the loop IV it should be expected something like
%cmp3 = icmp slt i32 0, %div
and this is verified looking at the instcombine debug trace:
////////////////////////////////////////////////
IC: Visiting: %cmp3 = icmp slt i32 0, %div
IC: Old = %cmp3 = icmp sgt i32 %div, 0
New = <badref> = icmp sge i32 %b, 2
IC: ADD: %cmp3 = icmp sge i32 %b, 2
IC: ERASE %0 = icmp sgt i32 %div, 0
IC: ADD: %div = sdiv i32 %b, 2
IC: Visiting: %div = sdiv i32 %b, 2
IC: Visiting: %cmp3 = icmp sge i32 %b, 2
IC: Old = %cmp3 = icmp sge i32 %b, 2
New = <badref> = icmp sgt i32 %b, 1
IC: ADD: %cmp3 = icmp sgt i32 %b, 1
IC: ERASE %0 = icmp sge i32 %b, 2
////////////////////////////////////////////////
The fact that a loop has been rotated it means that it has an entry guard that
allows in the case of countable loop to optimize the trip count expression:
it's
an information that can be lost during optimization and maybe not easy to derive
with an analysis. The usage of "IR extra information" would help in
this case.
Using intrinsics with declared side-effects as information handlers I don't
know
if it adds constraints to the optimizer. Indeed declaring no side-effects
won't
preserve the information.
How can all this be solved?
Thanks again.
Best regards,
Michele Scandale
On 02/07/2013 12:07 AM, Andrew Trick wrote:>
> On Feb 4, 2013, at 10:48 AM, Michele Scandale <michele.scandale at
gmail.com> wrote:
>
>> Dear all,
>>
>> I'm working on a late IR target dependent optimization on loops. A
part of this
>> optimization requires to derive "by hand" the trip-count
expression of a given
>> loop. In order to handle correctly these cases I need to check if the
loop has
>> an entry guard or not. The problem I have is that starting from the
information
>> I derive during my analysis (initial IV value, last IV value,
comparison kind) I
>> don't know how to verify if the entry guard is present or not,
mainly because
>> algebraic simplifications changed the entry guard comparison. Being
able to know
>> if a given loop has been rotated would be enough for my purposes but I
haven't
>> found a way to obtain this.
>>
>> I think that the only way is to track in the IR the information, but I
don't
>> know how to track it in a safe (the information shouldn't be lost)
and
>> transparent way (it should not interfere with the optimizer). Thinking
to custom
>> intrinsics I don't understand how to allow memory optimizations
and, at the same
>> time, prevent the intrinsic elimination or hoisting from the target
loop.
>>
>> I would appreciate any hint on this topic.
>
> See ScalarEvolution::isLoopEntryGuardedByCond. It's part of the
analysis that we use to determine trip count and benefits from rotated loops.
>
> LLVM never annotates the IR with a history of transformations. You could
potentially annotate the LoopInfo analysis, but it will only be preserved within
the same LoopPassManager (you would have to run right after LoopRotate).
>
> The only robust solution is ScalarEvolution.
>
> -Andy
>