thr3ads.net - llvm dev - [llvm-dev] Does poison add implicit "definedness" under the hood ? [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Stefanos Baziotis via llvm-dev

2020-Oct-30 11:25 UTC

[llvm-dev] Does poison add implicit "definedness" under the hood ?

Oh ok, now I get it, thank you!!

- Stefanos

Στις Παρ, 30 Οκτ 2020 στις 2:54 π.μ., ο/η Juneyoung Lee <
juneyoung.lee at sf.snu.ac.kr> έγραψε:
> I think it is slightly different, let me explain the reason.
>
> Lowering `z = add nsw x, y` into the following assembly is valid, but `z
> add x, y` into this isn't.
>
> if (x + y overflows)
>   z = 0;
> else
>   z = ADD x y
>
> Lowering `z = add x, y` should be more 'precise'.
> Instead of storing zero, 2's complement of x + y should be stored. This
> can be more expensive.
>
> Juneyoung
>
> On Fri, Oct 30, 2020 at 5:13 AM Stefanos Baziotis <
> stefanos.baziotis at gmail.com> wrote:
>
>> Hi Juneyoung,
>>
>> > Okay, previously I was assuming that P has two kinds of additions
>> (trapping add & silent add).
>> > Maybe this isn't true, and P always traps when overflow
happens? I hope
>> my understanding is correct now.
>>
>> Exactly, P has only one add that explodes on overflow. (btw just to be
>> clear, the point is
>> not P - P was just introduced as an example to showcase the implicit
>> semantics that poison adds).
>>
>> I think I understood most of what you said above. My point was not so
>> much nsw vs dropping nsw but in
>> any case, IIUC, dropping nsw is only bad since it only disables using
>> arithmetic properties and their
>> codegen is the same.
>>
>> > If we keep `add nsw` poison, lowering `add nsw` needs check, but
the
>> generated code is slightly different from `add`:
>> > ...
>> I'd think that this is the codegen of both `add nsw` and simply
`add`.
>> They both have to check whether there is overflow.
>>
>> > In case of `add x, y` without nsw, z should be 2's complement
otherwise.
>> I'm not sure I got that, but if P only supports 2's complement
arithmetic
>> anyway, that means that the codegen of the two (`add` and `add nsw`)
>> is identical right ?
>>
>> Thanks again,
>> Stefanos
>>
>> Στις Πέμ, 29 Οκτ 2020 στις 7:49 μ.μ., ο/η Juneyoung Lee <
>> juneyoung.lee at sf.snu.ac.kr> έγραψε:
>>
>>> Okay, previously I was assuming that P has two kinds of additions
>>> (trapping add & silent add).
>>> Maybe this isn't true, and P always traps when overflow
happens? I hope
>>> my understanding is correct now.
>>>
>>>
>>> There is `add` without nsw in IR, and the backend should still put
>>> overflow check when lowering it.
>>> If `add nsw` raises UB, dropping `nsw` from `add` will make the
>>> instruction more costly, so optimizations that drop nsw should be
done more
>>> carefully.
>>>
>>>
>>> If we keep `add nsw` poison, lowering `add nsw` needs check, but
the
>>> generated code is slightly different from `add`:
>>>
>>> // Assume that we're lowering `z = add nsw x, y` in IR into
P's assembly
>>> if (x + y overflows)
>>>   z = garbage value; // the backend can choose which value to put
here
>>> else
>>>   z = x + y; // never traps
>>>
>>> In case of `add x, y` without nsw, z should be 2's complement
otherwise.
>>>
>>> In perspective of optimizations, `add nsw` is still different from
`add`
>>> because it allows simplifying more expressions using arithmetic
properties.
>>> But, dropping nsw will make the instruction expensive (as the UB
case),
>>> so it should be carefully done as well.
>>>
>>>
>>> If clang explicitly inserts the overflow check in IR (either in if
or
>>> llvm.assume), I think that can be a good hint for code generation.
>>> Code motion of adds should be more carefully done though, because
the
>>> hint might be missed.
>>>
>>> Juneyoung
>>>
>>> On Thu, Oct 29, 2020 at 7:21 PM Stefanos Baziotis <
>>> stefanos.baziotis at gmail.com> wrote:
>>>
>>>> Hi Juneyoung,
>>>>
>>>> I think I lost you in your last example. First, I'm not
sure if this
>>>> last C code corresponds to LLVM IR code or after-LLVM-IR code.
Assuming
>>>> that we're talking about LLVM IR and for my imaginary
platform P:
>>>> The checking for overflow is not something that we have to do
in LLVM
>>>> IR, since as outlined in my previous email, LLVM IR has added
the semantics
>>>> that
>>>> signed overflow does not explode / trap (because if it
didn't add these
>>>> semantics, additions that exploded and which did not happen in
the original
>>>> program will happen now etc.). It is the lowering of LLVM IR
that has to
>>>> add it.
>>>>
>>>> So, AFAIU, the "timeline" looks like this:
>>>>
>>>> -- Before (LLVM IR):
>>>> for(i < n) {
>>>>   y = add nsw x + 1  <-- notice: no check, we don't need
to do it,
>>>> semantics say it doesn't trap
>>>>   use(y)
>>>> }
>>>> -- After (LLVM IR):
>>>> int tmp = add nsw x + 1;  <-- still no check..
>>>> for(i < n) {
>>>>   y =  tmp;
>>>>   use(y)
>>>> }
>>>>
>>>> Now, we get that after and go ahead to lower it (i.e.
after-LLVM-IR,
>>>> back-end work). I'll actually try to do a better job than
>>>> my original email:
>>>>
>>>> int tmp;
>>>> if (x != INT_MAX)  <-- notice the !=. This check is to
_guarantee_ that
>>>> the addition won't explode / trap since if that happened,
it would violate
>>>> the semantics, that we implicitly added, of LLVM IR.
>>>>   tmp = add nsw x + 1;
>>>> for(i < n) {
>>>>   y =  tmp;
>>>>   use(y)
>>>> }
>>>>
>>>> So this (your example):
>>>> for(i < n) {
>>>> if (x == INT_MAX)
>>>>   trap
>>>> y = add nsw x + 1
>>>> use(y)
>>>> }
>>>> =>
>>>> y = add nsw x + 1 // hoisted
>>>> for(i < n) {
>>>> if (x == INT_MAX)
>>>>   trap
>>>> use(y)
>>>> }
>>>>
>>>> does not make sense to me either in the LLVM IR level or in the
>>>> after-LLVM-IR level. For the former, it doesn't make sense
since I don't
>>>> see why we have to do the check inside the loop;
>>>> the semantics mandate that it doesn't happen so we're
ok, the burden is
>>>> on the back-end. For the latter (i.e. we're in LLVM IR
lowering, ISel or
>>>> smth), we simply can't hoist this add outside without a
check
>>>> since if x == INT_MAX, the platform P will explode
_immediately_, no
>>>> matter what the value of `n` (and of course, if `n == 0`, the
original code
>>>> wouldn't explode).
>>>>
>>>> Do I miss something ?
>>>> Guess: I think that what I miss is this: The semantics we added
is not
>>>> that signed wrapping won't explode. Rather: It does not
explode _if it is
>>>> not used in one of the X ways_. That would justify
>>>> your example transformation in the LLVM IR level.
>>>>
>>>> I hope this was not confusing, thanks for your time!
>>>>
>>>> - Stefanos
>>>>
>>>>
>>>> Στις Πέμ, 29 Οκτ 2020 στις 4:29 π.μ., ο/η Juneyoung Lee <
>>>> juneyoung.lee at sf.snu.ac.kr> έγραψε:
>>>>
>>>>> Hi Stefanos,
>>>>>
>>>>> So, to justify this transformation as correct, implicitly,
poison has
>>>>>> _added definedness_ to signed wrapping: specifically,
that the
>>>>>> computer won't explode if SW happens. AFAIU, that
is ok as far as C++
>>>>>> semantics
>>>>>> are concerned:
>>>>>> Since signed wrapping was UB, making it more defined is
ok.
>>>>>
>>>>>
>>>>> Your understanding is correct. Since signed overflow is UB
in C/C++,
>>>>> lowering from C to IR can make such programs more defined.
>>>>>
>>>>> Instead, they have to lower it as something like:
>>>>>> if (x == INT_MAX)
>>>>>>   skip or whatever
>>>>>
>>>>>
>>>>> Yes.
>>>>> This means that the overflow check and the actual add
operation can be
>>>>> separated. This requires instruction selection to carefully
combine the
>>>>> check and add, but for optimization this is beneficial
because add can
>>>>> still be freely moved.
>>>>>
>>>>> for(i < n) {
>>>>> if (x == INT_MAX)
>>>>>   trap
>>>>> y = add nsw x + 1
>>>>> use(y)
>>>>> }
>>>>> =>
>>>>> y = add nsw x + 1 // hoisted
>>>>> for(i < n) {
>>>>> if (x == INT_MAX)
>>>>>   trap
>>>>> use(y)
>>>>> }
>>>>>
>>>>> Juneyoung
>>>>>
>>>>> On Thu, Oct 29, 2020 at 5:56 AM Stefanos Baziotis <
>>>>> stefanos.baziotis at gmail.com> wrote:
>>>>>
>>>>>> Hi Juneyoung,
>>>>>>
>>>>>> First of all, great job on your talk!
>>>>>>
>>>>>> This is a question I guess you'd be the best person
to answer but the
>>>>>> rest of the LLVM community might want to participate.
>>>>>>
>>>>>> I was thinking about a UB-related example that has been
discussed by
>>>>>> multiple people
>>>>>> (including you), all of them basically authors of this
paper (
>>>>>>
https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf):
>>>>>>
>>>>>> -- Before opt:
>>>>>> for (int i = 0; i < n; ++i) {
>>>>>>   a[i] = x + 1;
>>>>>> }
>>>>>>
>>>>>> -- After opt (LICM):
>>>>>> int tmp = x + 1;
>>>>>> for (int i = 0; i < n; ++i) {
>>>>>>   a[i] = tmp;
>>>>>> }
>>>>>> // Assume `tmp` is never used again.
>>>>>>
>>>>>> The reasoning here, is let's make signed wrapping
_deferred_ UB that
>>>>>> will only
>>>>>> occur if the value is used in one of X ways (e.g. as a
denominator).
>>>>>> To that end, if
>>>>>> n == 0 and x == INT_MAX, UB will never occur because
the value is
>>>>>> never used.
>>>>>>
>>>>>> But, by doing that, the first point is:
>>>>>> If we translate this into machine code, the signed
wrapping _will_
>>>>>> happen, no matter
>>>>>> the value won't be used.
>>>>>>
>>>>>> Now, imagine that on some platform P, signed wrapping
explodes the
>>>>>> computer.
>>>>>> The computer _will_ explode (should explode ? more on
that later)
>>>>>> even if `n == 0`, something that would not happen in
the original
>>>>>> code.
>>>>>>
>>>>>> So, to justify this transformation as correct,
implicitly, poison has
>>>>>> _added definedness_ to signed wrapping: specifically,
that the
>>>>>> computer won't explode if SW happens. AFAIU, that
is ok as far as C++
>>>>>> semantics
>>>>>> are concerned:
>>>>>> Since signed wrapping was UB, making it more defined is
ok.
>>>>>>
>>>>>> But that definedness now has created a burden to
whoever is writing a
>>>>>> back-end
>>>>>> from LLVM IR to P (the SW exploding platform).
>>>>>> That is, now, if they see a `add <nsw>`, they
can't lower it to a
>>>>>> trivial signed add,
>>>>>> since if they do that and x == INT_MAX, the computer
will explode and
>>>>>> that violates
>>>>>> the semantics of _LLVM IR_ (since we mandated that SW
doesn't explode
>>>>>> the machine).
>>>>>>
>>>>>> Instead, they have to lower it as something like:
>>>>>> if (x == INT_MAX)
>>>>>>   skip or whatever
>>>>>>
>>>>>> Is this whole thinking correct ? UB, undef and poison
all are very
>>>>>> subtle so I'm trying
>>>>>> to wrap my head around.
>>>>>>
>>>>>> Thanks,
>>>>>> Stefanos Baziotis
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Juneyoung Lee
>>>>> Software Foundation Lab, Seoul National University
>>>>>
>>>>
>>>
>>> --
>>>
>>> Juneyoung Lee
>>> Software Foundation Lab, Seoul National University
>>>
>>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201030/e7b638be/attachment.html>

Shawn Landden via llvm-dev

2020-Oct-30 11:29 UTC

head link

[llvm-dev] Does poison add implicit "definedness" under the hood ?

<br /><br />15:25, 30 de octubre de 2020, "Stefanos Baziotis
via llvm-dev" <llvm-dev@lists.llvm.org>:<br /><blockquote
class="210e7a848e8fcb45wmi-quote"><div dir="ltr">Oh
ok, now I get it, thank you!!</div></blockquote>Please do not top
post<div><br
/></div><div>https://en.m.wikipedia.org/wiki/Posting_style<br
/><br />-- <br />Shawn Landden</div>

Stefanos Baziotis via llvm-dev

2020-Oct-30 15:49 UTC

head link

[llvm-dev] Does poison add implicit "definedness" under the hood ?

Hi Shawn,

That's the first time someone notifies me about this. It seems Gmail does
it by default, not sure if I can change it.

Best,
Stefanos

Στις Παρ, 30 Οκτ 2020 στις 1:29 μ.μ., ο/η Shawn Landden <shawn at git.icu>
έγραψε:
>
>
> 15:25, 30 de octubre de 2020, "Stefanos Baziotis via llvm-dev"
<
> llvm-dev at lists.llvm.org>:
>
> Oh ok, now I get it, thank you!!
>
> Please do not top post
>
> https://en.m.wikipedia.org/wiki/Posting_style
>
> --
> Shawn Landden
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201030/c91fff08/attachment.html>

llvm dev - Oct 2020 - Does poison add implicit "definedness" under the hood ?

[llvm-dev] Does poison add implicit "definedness" under the hood ?

[llvm-dev] Does poison add implicit "definedness" under the hood ?

[llvm-dev] Does poison add implicit "definedness" under the hood ?