I ran into an issue where we are missing optimization opportunities by not recognizing that: $cmp1 = icmp eq i8* %p, null br i1 %cmp1, label %is_null, label %not_null Is equivalent to: $cmp1 = icmp ne i8* %p, null br i1 %cmp1, label %not_null, label %is_null We do recognize and exploit this in GVN. However, that leaves us with a pass ordering problem where other passes are less effective. The biggest impact in practice appears to be on jump threading (missed optimizations) and unswitching (wasted compile time unswitching conditions which are actually known outside the loop). I can see two ways of approaching this and would value feedback about which is the right one. Option 1 - We can teach EarlyCSE, & JumpThreading to specifically optimize this pattern. This doesn't solve the general pass ordering problem, but it does give us far more chances to exploit the equivalent control flow for each run of the optimization pipeline (e.g. each round of inlining iteration). Option 2 - We could teach InstCombine to canonicalize one form into the other. On the surface, this seems like the "right" answer, but it also seems too easy. I suspect there's a reason this approach hasn't been done and that I simply don't know what it is. :) The only thing I could come up with is that inverting the branch direction might perturb code placement for compilations without profiling data. Is that the concern? Philip
Actually, ignore this. At least one of the cases I had throught I'd tested as not working is now working as expected. I'm suspecting I went wrong somewhere in my analysis. Until I figure out where, it's not worth discussing this proposal. Philip On 04/27/2015 04:05 PM, Philip Reames wrote:> I ran into an issue where we are missing optimization opportunities by > not recognizing that: > $cmp1 = icmp eq i8* %p, null > br i1 %cmp1, label %is_null, label %not_null > > Is equivalent to: > $cmp1 = icmp ne i8* %p, null > br i1 %cmp1, label %not_null, label %is_null > > We do recognize and exploit this in GVN. However, that leaves us with > a pass ordering problem where other passes are less effective. The > biggest impact in practice appears to be on jump threading (missed > optimizations) and unswitching (wasted compile time unswitching > conditions which are actually known outside the loop). > > I can see two ways of approaching this and would value feedback about > which is the right one. > > Option 1 - We can teach EarlyCSE, & JumpThreading to specifically > optimize this pattern. This doesn't solve the general pass ordering > problem, but it does give us far more chances to exploit the > equivalent control flow for each run of the optimization pipeline > (e.g. each round of inlining iteration). > > Option 2 - We could teach InstCombine to canonicalize one form into > the other. On the surface, this seems like the "right" answer, but it > also seems too easy. I suspect there's a reason this approach hasn't > been done and that I simply don't know what it is. :) > > The only thing I could come up with is that inverting the branch > direction might perturb code placement for compilations without > profiling data. Is that the concern? > > Philip > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Actually, ignore this. At least one of the cases I had thought I'd tested as not working is now working as expected. I'm suspecting I went wrong somewhere in my analysis. Until I figure out where, it's not worth discussing Philip On 04/27/2015 04:05 PM, Philip Reames wrote:> I ran into an issue where we are missing optimization opportunities by > not recognizing that: > $cmp1 = icmp eq i8* %p, null > br i1 %cmp1, label %is_null, label %not_null > > Is equivalent to: > $cmp1 = icmp ne i8* %p, null > br i1 %cmp1, label %not_null, label %is_null > > We do recognize and exploit this in GVN. However, that leaves us with > a pass ordering problem where other passes are less effective. The > biggest impact in practice appears to be on jump threading (missed > optimizations) and unswitching (wasted compile time unswitching > conditions which are actually known outside the loop). > > I can see two ways of approaching this and would value feedback about > which is the right one. > > Option 1 - We can teach EarlyCSE, & JumpThreading to specifically > optimize this pattern. This doesn't solve the general pass ordering > problem, but it does give us far more chances to exploit the > equivalent control flow for each run of the optimization pipeline > (e.g. each round of inlining iteration). > > Option 2 - We could teach InstCombine to canonicalize one form into > the other. On the surface, this seems like the "right" answer, but it > also seems too easy. I suspect there's a reason this approach hasn't > been done and that I simply don't know what it is. :) > > The only thing I could come up with is that inverting the branch > direction might perturb code placement for compilations without > profiling data. Is that the concern? > > Philip > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev