The recent discussion of nsw led me to go back and review the current definition of nsw in greater depth, and it turns out that even with all the theoretical effort, it's still logically inconsistent. First, a warning: The scenario below is artificial. This is just a demonstration. Also, ignore the fact that instcombine would zap everything. Depending on that would be an implicit pass dependency, which is against the rules. Ok, consider this LLVM IR code fragment: br i1 %overflow_check, label %no_overflow, label %end no_overflow: %t0 = add nsw i32 %a, %b %t1 = sext i32 %t0 to i64 %t2 = ashr i64 %t1, 31 %t3 = add i64 %t2, 1 %t5 = icmp ult %t3, 2 %t6 = udiv i1 1, %t5 Assume label %no_overflow has no other predecessors. And assume adding %a and %b can sometimes produce overflow, but only when %overflow_check is false. This code has no undefined behavior. It's a bit subtle, but in particular, note that it's not possible for the udiv to divide by zero, which would be undefined behavior. The sext produces a value where the high 33 bits are all identical, the ashr produces a value which is either all ones or all zeros, the add produces either 0 or 1, and the icmp ilt is always 1. We first perform a speculation transformation, hoisting all of the code above the %overflow_check branch: %t0 = add nsw i32 %a, %b %t1 = sext i32 %t0 to i64 %t2 = ashr i64 %t1, 31 %t3 = add i64 %t2, 1 %t5 = icmp ult %t3, 2 %t6 = udiv i1 1, %t5 br i1 %overflow_check, label %no_overflow, label %end no_overflow: Was this valid? If nsw overflow is immediate undefined behavior, this transformation would break the program, because the overflow is no longer guarded by %overflow_check. But a premise of this exercise is that we want to be able to speculate add nsw instructions. For now, let's assume that there's a way to define nsw which permits this, with some kind of deferred undefined behavior semantics. It's obviously valid to hoist the sext, ashr, add, and icmp that follow, because those instructions never have side effects. The udiv is less obvious, but an analysis of the icmp, add, ashr, and sext above it can show that the denominator of the udiv will never be zero, so the udiv will never have side effects. Note that this analysis doesn't actually need to look at the original add nsw instruction. Climbing up to the sext is enough. Next, we perform a promotion transformation, converting the add nsw from i32 to i64: %s0 = sext i32 %a to i64 %s1 = sext i32 %b to i64 %t0 = add nsw i64 %s0, %s1 %t2 = ashr i64 %t0, 31 %t3 = add i64 %t2, 1 %t5 = icmp ult %t3, 2 %t6 = udiv i1 1, %t5 br i1 %overflow_check, label %no_overflow, label %end no_overflow: Was this valid? Any time the new i64 add would produce a different value than the original sext would have, it would be a case where the 32-bit add had an overflow. The nsw says that the program would have undefined behavior in that case if the result is used, so this should be ok. Unfortunately, the final code is actually broken. If adding %a and %b as i32 values would overflow, adding them as i64 values would produce a value for which bit 31 is not equal to bit 32, so the subsequent ashr would produce a value which is not -1 or 0, the subsequent add would produce a value which is not 0 or 1, and the icmp ult would return 0, and the udiv would have undefined behavior. And it's unconditional. Consequently, the current definition of nsw is still inconsistent. Dan
Dan Gohman <gohman at apple.com> writes:> Next, we perform a promotion transformation, converting the add nsw > from i32 to i64: > > %s0 = sext i32 %a to i64 > %s1 = sext i32 %b to i64 > %t0 = add nsw i64 %s0, %s1 > %t2 = ashr i64 %t0, 31 > %t3 = add i64 %t2, 1 > %t5 = icmp ult %t3, 2 > %t6 = udiv i1 1, %t5 > br i1 %overflow_check, label %no_overflow, label %end > > no_overflow: > > Was this valid? > > Any time the new i64 add would produce a different value than the > original sext would have, it would be a case where the 32-bit add > had an overflow. The nsw says that the program would have undefined > behavior in that case if the result is used, so this should be ok. > > > Unfortunately, the final code is actually broken. If adding %a > and %b as i32 values would overflow, adding them as i64 values > would produce a value for which bit 31 is not equal to bit 32, so > the subsequent ashr would produce a value which is not -1 or 0, the > subsequent add would produce a value which is not 0 or 1, and the > icmp ult would return 0, and the udiv would have undefined > behavior. And it's unconditional.I'm not following. If the promotion to i64 produces a different value, then the nsw smeantic was violated, leading to undefined behavior, as you note. That that point all bets are off. Divide by zero certainly is a perfectly valid expression of undefined behavior. If we had a delayed check we would have to put it somewhere before the udiv. We would probably need some kind of fixup path _a_la_ IA64's check instruction. The whole point of this is to be able to hoist nsw operations. But any form of static speculation will require fixup code in the cases where the speculation is wrong. There really is a cost to hoisting code with potential side-efffects across branches. I don't see any way to get around that. -Dave
Rafael Ávila de Espíndola
2011-Dec-14 19:54 UTC
[LLVMdev] nsw is still logically inconsistent
> We first perform a speculation transformation, hoisting all of the > code above the %overflow_check branch: > > %t0 = add nsw i32 %a, %b > %t1 = sext i32 %t0 to i64 > %t2 = ashr i64 %t1, 31 > %t3 = add i64 %t2, 1 > %t5 = icmp ult %t3, 2 > %t6 = udiv i1 1, %t5 > br i1 %overflow_check, label %no_overflow, label %end > > no_overflow: > > Was this valid? > > If nsw overflow is immediate undefined behavior, this transformation > would break the program, because the overflow is no longer guarded > by %overflow_check. But a premise of this exercise is that we want > to be able to speculate add nsw instructions. For now, let's assume > that there's a way to define nsw which permits this, with some kind > of deferred undefined behavior semantics.Could we say that moving the udiv was invalid? It is a instruction witch can cause undefined behavior and doing the move makes it exposed to poison values in conditions it was not before.> Dan >Cheers, Rafael
2011/12/14 Rafael Ávila de Espíndola <rafael.espindola at gmail.com>:>> We first perform a speculation transformation, hoisting all of the >> code above the %overflow_check branch: >> >> %t0 = add nsw i32 %a, %b >> %t1 = sext i32 %t0 to i64 >> %t2 = ashr i64 %t1, 31 >> %t3 = add i64 %t2, 1 >> %t5 = icmp ult %t3, 2 >> %t6 = udiv i1 1, %t5 >> br i1 %overflow_check, label %no_overflow, label %end >> >> no_overflow: >> >> Was this valid? >> >> If nsw overflow is immediate undefined behavior, this transformation >> would break the program, because the overflow is no longer guarded >> by %overflow_check. But a premise of this exercise is that we want >> to be able to speculate add nsw instructions. For now, let's assume >> that there's a way to define nsw which permits this, with some kind >> of deferred undefined behavior semantics. > > Could we say that moving the udiv was invalid? It is a instruction > witch can cause undefined behavior and doing the move makes it exposed > to poison values in conditions it was not before.Note that this matches existing practice: IIRC, for a testcase like the given testcase, LICM will hoist everything except the div instruction. -Eli
On Dec 12, 2011, at 12:58 PM, Dan Gohman wrote:> Any time the new i64 add would produce a different value than the > original sext would have, it would be a case where the 32-bit add > had an overflow. The nsw says that the program would have undefined > behavior in that case if the result is used, so this should be ok.The way I have always thought of this is that speculating an operation with side effects (udiv) cannot be safely done based on a speculative value. So, whenever you speculate a control dependent instruction (add<nsw>), regardless of whether it has side effects, you need to transfer the control dependence to its consumers. In an IR with explicit control dependence, that can be done literally by transferring the control edge downward in the SSA graph until you reach an operation with side-effects. In an IR without explicit control dependence, like all C compilers I've seen, you have no way to enforce this, so end up relying on each optimization to be aware and not too aggressive. -Andy (who used to safely speculate loads above type checks)
On Dec 14, 2011, at 9:11 AM, David A. Greene wrote:> Dan Gohman <gohman at apple.com> writes: > >> Unfortunately, the final code is actually broken. If adding %a >> and %b as i32 values would overflow, adding them as i64 values >> would produce a value for which bit 31 is not equal to bit 32, so >> the subsequent ashr would produce a value which is not -1 or 0, the >> subsequent add would produce a value which is not 0 or 1, and the >> icmp ult would return 0, and the udiv would have undefined >> behavior. And it's unconditional. > > I'm not following. If the promotion to i64 produces a different value, > then the nsw smeantic was violated, leading to undefined behavior, as > you note. That that point all bets are off. Divide by zero certainly > is a perfectly valid expression of undefined behavior. If we had a > delayed check we would have to put it somewhere before the udiv. We > would probably need some kind of fixup path _a_la_ IA64's check > instruction.The original code was well-behaved for all inputs. The final code has undefined behavior for some inputs. Fixup paths don't really seem to fit here. We're never going to want fixup paths in the final output, for example. And there is no way to fix up undefined behavior. Dan