thr3ads.net - llvm dev - [LLVMdev] nsw is still logically inconsistent [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2011-Dec-12 20:58 UTC

[LLVMdev] nsw is still logically inconsistent

The recent discussion of nsw led me to go back and review the
current definition of nsw in greater depth, and it turns out that
even with all the theoretical effort, it's still logically
inconsistent.


First, a warning: The scenario below is artificial. This is just
a demonstration. Also, ignore the fact that instcombine would zap
everything. Depending on that would be an implicit pass dependency,
which is against the rules.

Ok, consider this LLVM IR code fragment:

  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:
  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5

Assume label %no_overflow has no other predecessors. And assume
adding %a and %b can sometimes produce overflow, but only when
%overflow_check is false.

This code has no undefined behavior. It's a bit subtle, but in
particular, note that it's not possible for the udiv to divide by
zero, which would be undefined behavior. The sext produces a
value where the high 33 bits are all identical, the ashr produces
a value which is either all ones or all zeros, the add produces
either 0 or 1, and the icmp ilt is always 1.

We first perform a speculation transformation, hoisting all of the
code above the %overflow_check branch:

  %t0 = add nsw i32 %a, %b
  %t1 = sext i32 %t0 to i64
  %t2 = ashr i64 %t1, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

If nsw overflow is immediate undefined behavior, this transformation
would break the program, because the overflow is no longer guarded
by %overflow_check. But a premise of this exercise is that we want
to be able to speculate add nsw instructions. For now, let's assume
that there's a way to define nsw which permits this, with some kind
of deferred undefined behavior semantics.

It's obviously valid to hoist the sext, ashr, add, and icmp that
follow, because those instructions never have side effects.

The udiv is less obvious, but an analysis of the icmp, add, ashr,
and sext above it can show that the denominator of the udiv will
never be zero, so the udiv will never have side effects. Note that
this analysis doesn't actually need to look at the original add nsw
instruction. Climbing up to the sext is enough.

Next, we perform a promotion transformation, converting the add nsw
from i32 to i64:

  %s0 = sext i32 %a to i64
  %s1 = sext i32 %b to i64
  %t0 = add nsw i64 %s0, %s1
  %t2 = ashr i64 %t0, 31
  %t3 = add i64 %t2, 1
  %t5 = icmp ult %t3, 2
  %t6 = udiv i1 1, %t5
  br i1 %overflow_check, label %no_overflow, label %end

no_overflow:

Was this valid?

Any time the new i64 add would produce a different value than the
original sext would have, it would be a case where the 32-bit add
had an overflow. The nsw says that the program would have undefined
behavior in that case if the result is used, so this should be ok.


Unfortunately, the final code is actually broken. If adding %a
and %b as i32 values would overflow, adding them as i64 values
would produce a value for which bit 31 is not equal to bit 32, so
the subsequent ashr would produce a value which is not -1 or 0, the
subsequent add would produce a value which is not 0 or 1, and the
icmp ult would return 0, and the udiv would have undefined
behavior. And it's unconditional.

Consequently, the current definition of nsw is still inconsistent.

Dan

David A. Greene

2011-Dec-14 17:11 UTC

head link

[LLVMdev] nsw is still logically inconsistent

Dan Gohman <gohman at apple.com> writes:
> Next, we perform a promotion transformation, converting the add nsw
> from i32 to i64:
>
>   %s0 = sext i32 %a to i64
>   %s1 = sext i32 %b to i64
>   %t0 = add nsw i64 %s0, %s1
>   %t2 = ashr i64 %t0, 31
>   %t3 = add i64 %t2, 1
>   %t5 = icmp ult %t3, 2
>   %t6 = udiv i1 1, %t5
>   br i1 %overflow_check, label %no_overflow, label %end
>
> no_overflow:
>
> Was this valid?
>
> Any time the new i64 add would produce a different value than the
> original sext would have, it would be a case where the 32-bit add
> had an overflow. The nsw says that the program would have undefined
> behavior in that case if the result is used, so this should be ok.
>
>
> Unfortunately, the final code is actually broken. If adding %a
> and %b as i32 values would overflow, adding them as i64 values
> would produce a value for which bit 31 is not equal to bit 32, so
> the subsequent ashr would produce a value which is not -1 or 0, the
> subsequent add would produce a value which is not 0 or 1, and the
> icmp ult would return 0, and the udiv would have undefined
> behavior. And it's unconditional.
I'm not following.  If the promotion to i64 produces a different value,
then the nsw smeantic was violated, leading to undefined behavior, as
you note.  That that point all bets are off.  Divide by zero certainly
is a perfectly valid expression of undefined behavior.  If we had a
delayed check we would have to put it somewhere before the udiv.  We
would probably need some kind of fixup path _a_la_ IA64's check
instruction.

The whole point of this is to be able to hoist nsw operations.  But any
form of static speculation will require fixup code in the cases where
the speculation is wrong.  There really is a cost to hoisting code with
potential side-efffects across branches.  I don't see any way to get
around that.

                               -Dave

Rafael Ávila de Espíndola

2011-Dec-14 19:54 UTC

head link

[LLVMdev] nsw is still logically inconsistent

> We first perform a speculation transformation, hoisting all of the
> code above the %overflow_check branch:
> 
>   %t0 = add nsw i32 %a, %b
>   %t1 = sext i32 %t0 to i64
>   %t2 = ashr i64 %t1, 31
>   %t3 = add i64 %t2, 1
>   %t5 = icmp ult %t3, 2
>   %t6 = udiv i1 1, %t5
>   br i1 %overflow_check, label %no_overflow, label %end
> 
> no_overflow:
> 
> Was this valid?
> 
> If nsw overflow is immediate undefined behavior, this transformation
> would break the program, because the overflow is no longer guarded
> by %overflow_check. But a premise of this exercise is that we want
> to be able to speculate add nsw instructions. For now, let's assume
> that there's a way to define nsw which permits this, with some kind
> of deferred undefined behavior semantics.
Could we say that moving the udiv was invalid? It is a instruction
witch can cause undefined behavior and doing the move makes it exposed
to poison values in conditions it was not before.
> Dan
> 
Cheers,
Rafael

Eli Friedman

2011-Dec-14 20:11 UTC

head link

[LLVMdev] nsw is still logically inconsistent

2011/12/14 Rafael Ávila de Espíndola <rafael.espindola at
gmail.com>:>> We first perform a speculation transformation, hoisting all of the
>> code above the %overflow_check branch:
>>
>>   %t0 = add nsw i32 %a, %b
>>   %t1 = sext i32 %t0 to i64
>>   %t2 = ashr i64 %t1, 31
>>   %t3 = add i64 %t2, 1
>>   %t5 = icmp ult %t3, 2
>>   %t6 = udiv i1 1, %t5
>>   br i1 %overflow_check, label %no_overflow, label %end
>>
>> no_overflow:
>>
>> Was this valid?
>>
>> If nsw overflow is immediate undefined behavior, this transformation
>> would break the program, because the overflow is no longer guarded
>> by %overflow_check. But a premise of this exercise is that we want
>> to be able to speculate add nsw instructions. For now, let's assume
>> that there's a way to define nsw which permits this, with some kind
>> of deferred undefined behavior semantics.
>
> Could we say that moving the udiv was invalid? It is a instruction
> witch can cause undefined behavior and doing the move makes it exposed
> to poison values in conditions it was not before.
Note that this matches existing practice: IIRC, for a testcase like
the given testcase, LICM will hoist everything except the div
instruction.

-Eli

Andrew Trick

2011-Dec-14 21:49 UTC

head link

[LLVMdev] nsw is still logically inconsistent

On Dec 12, 2011, at 12:58 PM, Dan Gohman wrote:> Any time the new i64 add would produce a different value than the
> original sext would have, it would be a case where the 32-bit add
> had an overflow. The nsw says that the program would have undefined
> behavior in that case if the result is used, so this should be ok.
The way I have always thought of this is that speculating an operation with side
effects (udiv) cannot be safely done based on a speculative value.

So, whenever you speculate a control dependent instruction (add<nsw>),
regardless of whether it has side effects, you need to transfer the control
dependence to its consumers.

In an IR with explicit control dependence, that can be done literally by
transferring the control edge downward in the SSA graph until you reach an
operation with side-effects. In an IR without explicit control dependence, like
all C compilers I've seen, you have no way to enforce this, so end up
relying on each optimization to be aware and not too aggressive.

-Andy

(who used to safely speculate loads above type checks)

Dan Gohman

2011-Dec-15 01:10 UTC

head link

[LLVMdev] nsw is still logically inconsistent

On Dec 14, 2011, at 9:11 AM, David A. Greene wrote:
> Dan Gohman <gohman at apple.com> writes:
> 
>> Unfortunately, the final code is actually broken. If adding %a
>> and %b as i32 values would overflow, adding them as i64 values
>> would produce a value for which bit 31 is not equal to bit 32, so
>> the subsequent ashr would produce a value which is not -1 or 0, the
>> subsequent add would produce a value which is not 0 or 1, and the
>> icmp ult would return 0, and the udiv would have undefined
>> behavior. And it's unconditional.
> 
> I'm not following.  If the promotion to i64 produces a different value,
> then the nsw smeantic was violated, leading to undefined behavior, as
> you note.  That that point all bets are off.  Divide by zero certainly
> is a perfectly valid expression of undefined behavior.  If we had a
> delayed check we would have to put it somewhere before the udiv.  We
> would probably need some kind of fixup path _a_la_ IA64's check
> instruction.
The original code was well-behaved for all inputs.
The final code has undefined behavior for some inputs.

Fixup paths don't really seem to fit here. We're never going
to want fixup paths in the final output, for example. And
there is no way to fix up undefined behavior.

Dan

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Dec 2011 - [LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

[LLVMdev] nsw is still logically inconsistent

Maybe Matching Threads