thr3ads.net - llvm dev - [llvm-dev] RFC: Killing undef and spreading poison [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das via llvm-dev

2016-Nov-11 21:58 UTC

[llvm-dev] RFC: Killing undef and spreading poison

Hi John,

John McCall wrote:>> On Nov 10, 2016, at 10:37 PM, Sanjoy Das<sanjoy at
playingwithpointers.com>  wrote:
>> As a concrete example, say we have:
>>
>> i32 sum = x *nsw y
>> i64 sum.sext = sext sum to i64
>> ...
>> some use of sum.sext
>>
>>
>> Pretending "x +nsw 1" does not sign-overflow, we can commute
the sext
>> into the arithmetic, but we still somehow need to capture the fact
>> that, depending on the optimizer's whims and fancies (i.e. whether
it
>> does the commute or not), the high 32 bits of sum.sext can now be
>> something other than all zeroes or all ones.
>
> Well, but you can assume it can't, though, because the counter-factual
still applies.
>
>>> Actually optimizing code that we've proven has undefined
behavior, in contrast, is basically uninteresting and leads us into these silly
philosophical problems.
>> I don't think we care about optimizing code that we've _proven_
has
>> undefined behavior.  We care about optimizing code in ways that is
>> correct in the face of *other* transforms that we ourselves want to
>> do, where these "other transforms" pretend certain abnormal
cases do
>> not exist.  Poison is a "stand-in" for these transforms,
which are
>> sometimes non-local.
>
> I'm saying that you still get reasonable behavior from reasoning about
> counterfactuals for these things, and I haven't been convinced that you
> lose optimization power except in code that actually exhibits undefined
> behavior.
I'm not sure I understood your concern fully, but you could have cases like:

for (...) {
   if (condition){
     i32 prod = x *nsw y
     i64 prod.sext = sext prod to i64
     i64 t = K `udiv` (-1 + (sum.prod >> 32))
     use(t)
   }
}

The C program this came from is well defined in two cases (that are 
relevant here):

  1. condition is false
  2. condition is true and x * y does not sign overflow

We'd like to LICM the entire expression tree out of control flow in a 
way that it helps the well defined case (2) without introducing a fault 
in the well defined case (1).

I'm also not very clear on what you mean by "counterfactual".  The
way
you're using it seems a lot like "we have UB if such and such a thing 
happens", which is something we're trying to avoid.

Does this address your point?
>> For instance, continuing the previous example, say we're interested
in
>> the speculatibility of
>>
>> t = K `udiv` (-1 + (sum.sext>>  32))
>>
>> We don't _really_ care about doing something intelligent when
sum.sext
>> is provably poison.  However, we do care about taking into the
>> _possibility_ of sum.sext being poison, which is really just a more
>> precise way of saying that we care about taking into the possibility
>> of sum.sext being commuted into (sext(x) * sext(y)) (in which case the
>> division is not speculatable).
>
> It seems to me that counterfactuals are fine for reasoning about this.
>
>> And we want to do this with enough formalism in place so that we can
>> write correct checking interpreters, fuzzers etc.
>
> I will happily admit that a pure counterfactual-reasoning model does not
> admit the ability to write checking interpreters in the face of
speculation.
>
>>> I would suggest:
>>> 1. Make undef an instruction which produces an arbitrary but
consistent result of its type.
>>> 2. Revisit most of the transformations that actually try to
introduce undef.  The consistency rule may make may of them questionable.  In
particular, sinking undef into a loop may change semantics.
>>> 3. Understand that folding something to undef may actually lose
information.  It's possible to prove that %x<   %x +nsw 1.  It's not
possible to prove that %x<   undef.  Accept that this is ok because
optimizing undefined behavior is not actually interesting.
>> As I said above, we don't really care about folding cases that
we've
>> proved to have UB.
>>
>> Moreover, in your scheme, we still won't be able to do
optimizations
>> like:
>>
>>   %y = ...
>>   %undef.0 = undefinst
>>   %val = select i1 %cond, %undef.0, %y
>>   print(%val)
>>
>>   to
>>
>>   %y = ...
>>   print(%val)
>>
>> since it could have been that
>>
>>   %x = INT_MAX at runtime, but unknown at compile time
>>   %y = %x +nsw 1
>>   %undef.0 = undefinst
>>   %val = select i1 %cond, %undef.0, %y
>>   print(%val) == %x s<  %val
>>
>> Folding %val to %y will cause us to go always printing false to
>> printing either true or false, depending on the will of the compiler.
>
> I think you *can* do these optimizations, you just have to do them very
late
> at a point where you're not going to do certain kinds of downstream
optimization.
> This kind of phase ordering, where certain high-level optimizations are
only
> permitted in earlier passes, is a pretty standard technique in optimizers,
and
> we use it for plenty of other things.
I'm not aware of transforms that would be _incorrect_ (by design, that 
is -- I'm sure we have bugs here :) ) if we got the pass ordering wrong. 
  I thought we only relied on ordering for canonicalization -- "don't 
introduce vector types too early, since that will confuse later 
optimizations and they won't do a very good job (but would still be 
correct)" and things like that.

However, your point about pass ordering is interesting since that is 
similar to how I see freeze: as an optimization barrier that that we use 
to block these non-local optimizations on a case by case basis if we've 
made a transform that relied on them not happening.  But getting the 
pass order wrong won't miscompile your program, the worst that'll happen
is you'll introduce too many freezes, and that'll unnecessarily block 
optimizations.

-- Sanjoy
> A more prominent example is that my model doesn't allow the middle-end
to
> freely eliminate "store undef, %x".  That store does have the
effect of initializing
> the memory to a consistent value, and that difference can be detected by
> subsequent load/store optimizations.  But the code generator can still
trivially
> peephole this (when the undef has only one use) because there won't be
any
> subsequent load/store optimizations.
>
> John.
>
>>
>> I have to think a bit about this, but in your scheme I think we will
>> generally be only able to fold `undefinst` to constants since
>> non-constant values could have expressions like `<INT_MAX at
>> runtime>  +nsw 1` as their source which would justify non-local
>> transforms that `undefinst` would not.
>>
>> -- Sanjoy
>

Sanjoy Das via llvm-dev

2016-Nov-12 07:55 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

Hi John,

On 11/11/16, 1:58 PM, Sanjoy Das wrote:
 >> I'm saying that you still get reasonable behavior from reasoning
about
 >> counterfactuals for these things, and I haven't been convinced
that you
 >> lose optimization power except in code that actually exhibits
undefined
 >> behavior.
 >
 > I'm not sure I understood your concern fully, but you could have cases
like:
 >
 > for (...) {
 > if (condition){
 > i32 prod = x *nsw y
 > i64 prod.sext = sext prod to i64
 > i64 t = K `udiv` (-1 + (sum.prod >> 32))
 > use(t)
 > }
 > }

I'm sure it is obvious to you, but there's a typo here -- it should
have been "i64 t = K `udiv` (-1 + (prod.sext >> 32))"

 > The C program this came from is well defined in two cases (that are
relevant here):
 >
 > 1. condition is false
 > 2. condition is true and x * y does not sign overflow
 >
 > We'd like to LICM the entire expression tree out of control flow in a
way that it helps the well defined case (2)
 > without introducing a fault in the well defined case (1).

To be a 100% explicit, by (1) I mean a situation where:

   a. condition is false
   b. (say) x is 1 << 20, and y is 1 << 12 so "x *nsw y"
does sign
      overflow, and ((sext(x) * sext(y)) >> 32) is 1

I'll also spell out what I was trying to say more explicitly, since we
can easily sink too much time talking about subtly different things
otherwise (has happened in the past :) ):

Firstly consider this sequence of transforms:

   for (...) {
     if (condition){
       i32 prod = x *nsw y
       i64 prod.sext = sext prod to i64
       i64 t = K `udiv` (-1 + (sum.prod >> 32))
       use(t)
     }
   }

==> (step 1) multiplication and sext can be speculated

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   for (...) {
     if (condition){
       i64 t = K `udiv` (-1 + (prod.sext >> 32))
       use(t)
     }
   }

==> (step 2) known bits analysis on sext to prove divisor is non-zero

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }

==> (step 3) commute sext through nsw multiplicationa

   ;; i32 prod = x *nsw y
   i64 prod.sext = (sext x) *nsw (sext y)
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }


Now we've managed to introduce a fault if we were in case 1 -- the
source program was well defined but the target program divides by
zero.

This means whatever scheme / semantics / "rule book" LLVM implements
has to outlaw at least one of the steps above.

For the freeze/poison scheme that step is "step 2".  Under the
freeze/poison scheme step 2 will be

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   for (...) {
     if (condition){
       i64 t = K `udiv` (-1 + (prod.sext >> 32))
       use(t)
     }
   }

==> (step 2) known bits analysis on sext to prove divisor is non-zero

   i32 prod = x *nsw y
   i32 prod.frozen = freeze(prod)
   i64 prod.sext = sext prod.frozen to i64
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }

Which would further outlaw step 3.

If we go with counterfactual reasoning, then what is that step above
that is illegal?

-- Sanjoy

John McCall via llvm-dev

2016-Nov-12 21:38 UTC

head link

[llvm-dev] RFC: Killing undef and spreading poison

> On Nov 11, 2016, at 11:55 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi John,
> 
> On 11/11/16, 1:58 PM, Sanjoy Das wrote:
> >> I'm saying that you still get reasonable behavior from
reasoning about
> >> counterfactuals for these things, and I haven't been convinced
that you
> >> lose optimization power except in code that actually exhibits
undefined
> >> behavior.
> >
> > I'm not sure I understood your concern fully, but you could have
cases like:
> >
> > for (...) {
> > if (condition){
> > i32 prod = x *nsw y
> > i64 prod.sext = sext prod to i64
> > i64 t = K `udiv` (-1 + (sum.prod >> 32))
> > use(t)
> > }
> > }
> 
> I'm sure it is obvious to you, but there's a typo here -- it should
> have been "i64 t = K `udiv` (-1 + (prod.sext >> 32))"
> 
> > The C program this came from is well defined in two cases (that are
relevant here):
> >
> > 1. condition is false
> > 2. condition is true and x * y does not sign overflow
> >
> > We'd like to LICM the entire expression tree out of control flow
in a way that it helps the well defined case (2)
> > without introducing a fault in the well defined case (1).
> 
> To be a 100% explicit, by (1) I mean a situation where:
> 
>  a. condition is false
>  b. (say) x is 1 << 20, and y is 1 << 12 so "x *nsw
y" does sign
>     overflow, and ((sext(x) * sext(y)) >> 32) is 1
> 
> I'll also spell out what I was trying to say more explicitly, since we
> can easily sink too much time talking about subtly different things
> otherwise (has happened in the past :) ):
> 
> Firstly consider this sequence of transforms:
> 
>  for (...) {
>    if (condition){
>      i32 prod = x *nsw y
>      i64 prod.sext = sext prod to i64
>      i64 t = K `udiv` (-1 + (sum.prod >> 32))
>      use(t)
>    }
>  }
> 
> ==> (step 1) multiplication and sext can be speculated
> 
>  i32 prod = x *nsw y
>  i64 prod.sext = sext prod to i64
>  for (...) {
>    if (condition){
>      i64 t = K `udiv` (-1 + (prod.sext >> 32))
>      use(t)
>    }
>  }
> 
> ==> (step 2) known bits analysis on sext to prove divisor is non-zero
> 
>  i32 prod = x *nsw y
>  i64 prod.sext = sext prod to i64
>  i64 t = K `udiv` (-1 + (prod.sext >> 32))
>  for (...) {
>    if (condition){
>      use(t)
>    }
>  }
> 
> ==> (step 3) commute sext through nsw multiplicationa
> 
>  ;; i32 prod = x *nsw y
>  i64 prod.sext = (sext x) *nsw (sext y)
>  i64 t = K `udiv` (-1 + (prod.sext >> 32))
>  for (...) {
>    if (condition){
>      use(t)
>    }
>  }
> 
> 
> Now we've managed to introduce a fault if we were in case 1 -- the
> source program was well defined but the target program divides by
> zero.
> 
> This means whatever scheme / semantics / "rule book" LLVM
implements
> has to outlaw at least one of the steps above.
> 
> For the freeze/poison scheme that step is "step 2".  Under the
> freeze/poison scheme step 2 will be
> 
>  i32 prod = x *nsw y
>  i64 prod.sext = sext prod to i64
>  for (...) {
>    if (condition){
>      i64 t = K `udiv` (-1 + (prod.sext >> 32))
>      use(t)
>    }
>  }
> 
> ==> (step 2) known bits analysis on sext to prove divisor is non-zero
> 
>  i32 prod = x *nsw y
>  i32 prod.frozen = freeze(prod)
>  i64 prod.sext = sext prod.frozen to i64
>  i64 t = K `udiv` (-1 + (prod.sext >> 32))
>  for (...) {
>    if (condition){
>      use(t)
>    }
>  }
> 
> Which would further outlaw step 3.
> 
> If we go with counterfactual reasoning, then what is that step above
> that is illegal?
I see your point.  I think this is going to be a very limited tool, suitable for
things
like nsw that have obvious "right answers" that you can freeze the
poison out of,
but I concede that it's nice for optimizing the overflow cases.

John.

llvm dev - Nov 2016 - RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison

[llvm-dev] RFC: Killing undef and spreading poison