thr3ads.net - llvm dev - [llvm-dev] The semantics of nonnull attribute [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Juneyoung Lee via llvm-dev

2020-Feb-18 01:45 UTC

[llvm-dev] The semantics of nonnull attribute

Hello all,

LangRef says it is undefined behavior to pass null to a nonnull argument
(`call f(nonnull null);`), but the semantics is too strong for a few
existing optimizations.
To support these, we can relax the semantics so `f(nonnull null)` is
equivalent to `f(poison)`, but (A) it again blocks another set of
optimizations, and (B) this makes the semantics of nonnull deviate from
other attributes like dereferenceable.
I found that there was a similar discussion about this issue in the past as
well, but seems it is not settled yet.
What should the semantics of nonnull be?
I listed a few optimizations that are relevant with this issue.


1. Propagating nonnull attribute to callee's arg (
https://godbolt.org/z/-cVsVP )

g(i8* ptr) {
f(nonnull ptr);
}
=>
g(i8* nonnull ptr) {
f(nonnull ptr);
}

This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null)
should have raised UB, so ptr shouldn't be null.
However, this transformation is incorrect if f(nonnull null) is equivalent
to f(poison).
If f was an empty function, f(nonnull null) never raises UB regardless of
ptr. So we can't guarantee ptr != null at other uses of ptr.


2. InstCombine (https://godbolt.org/z/HDQ7rD ):

%ptr_inb = gep inbounds %any_ptr, 1
f(%ptr_inb)
=>
%ptr_inb = .. (identical)
f(nonnull %ptr_inb)

This optimization is incorrect if `f(nonnull null)` is UB. The reason is as
follows.
If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
whereas the optimized one is `f(nonnull poison)`.
`f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the
transformation introduced UB.
This optimization is correct if both `f(nonnull null)` and `f(nonnull
poison)` are equivalent to `f(poison)`.


3. https://reviews.llvm.org/D69477

f(nonnull ptr);
use(ptr);
=>
llvm.assume(ptr != null);
use(ptr);
f(nonnull ptr);

If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the
added llvm.assume(ptr != null) raises UB whereas the source may not raise
UB at all. (e.g. assume that f() was an empty function)
If f(nonnull null) is UB, this is correct.


4. Dead argument elimination (from https://reviews.llvm.org/D70749 )

f(nonnull ptr); // f’s argument is dead
=>
f(nonnull undef);

This is incorrect if f(nonnull null) is UB. To make this correct, nonnull
should be dropped. This becomes harder to fix if nonnull was attached at
the signature of a function (not at the callee site).
It is correct if f(nonnull null) is f(poison).

Actually the D70749's thread had an end-to-end miscompilation example due
to the interaction between this DAE and the optimization 3 (insertion of
llvm.assume).

Thanks,
Juneyoung Lee
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/87d18a9f/attachment.html>

Johannes Doerfert via llvm-dev

2020-Feb-18 02:35 UTC

head link

[llvm-dev] The semantics of nonnull attribute

Hi Juneyoung,

thank you for starting this discussion again, the last one [0] did not
really end in a solution but the issue remains and will only become
worse, e.g., enable the Attributor and you might see all of the problems
below exposed in a single pass [1-4].

As I argued in [0], UB is problematic and I am strongly in favor of
poison. In some sense I see UB as a very "strong" guarantee and poison
as a fairly "week" one. We will have a hard time deriving the strong
one
and it might cause us to break things along the way, I mean your
examples show that we already miscompile codes.

I agree that poison is on its own not "strong enough" for certain
optimization, as you pointed out below, but we could reasonably augment
it with another attribute, let's call it `used`, that basically bridges
the gap:

  void @foo(nonnull);
  void @bar(nonnull used);

  foo(NULL)  // <- argument is poison inside of @foo
  bar(NULL)  // <- UB at the call site of @bar

A value is `used` if it would inevitably cause UB if it was poison at
this program point. I hope this makes some sense if not I can explain in
more detail.

Btw. I think this also integrates well with other attribute problems we
had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in this
example were "not used" (in some naive implementation of memcpy at
least).

Maybe I have overlooked a problem with this solution so let me know what
you think.

Cheers,
  Johannes


[0] https://reviews.llvm.org/D70749
[1] 1. below: https://godbolt.org/z/J4wFVw
[2] 2. below: https://godbolt.org/z/TAkSC6
[3] 3. below: https://godbolt.org/z/UqYi6S (doesn't work yet, will though)
[4] 4. below: https://godbolt.org/z/Qkkc_E
              https://godbolt.org/z/uXAfp_
              ...

On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all,
> 
> LangRef says it is undefined behavior to pass null to a nonnull argument
> (`call f(nonnull null);`), but the semantics is too strong for a few
> existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is
> equivalent to `f(poison)`, but (A) it again blocks another set of
> optimizations, and (B) this makes the semantics of nonnull deviate from
> other attributes like dereferenceable.
> I found that there was a similar discussion about this issue in the past as
> well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
> 
> 
> 1. Propagating nonnull attribute to callee's arg (
> https://godbolt.org/z/-cVsVP )
> 
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
> 
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null)
> should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is equivalent
> to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless of
> ptr. So we can't guarantee ptr != null at other uses of ptr.
For the given code snipped it is not even invalid to propagate nonnull
in the poison case but your point is not wrong, if we have f(poison) we
cannot derive anything from it if the argument might be unused.

> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> 
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
> 
> This optimization is incorrect if `f(nonnull null)` is UB. The reason is as
> follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the
> transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull
> poison)` are equivalent to `f(poison)`.
> 
> 
> 3. https://reviews.llvm.org/D69477
> 
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
> 
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the
> added llvm.assume(ptr != null) raises UB whereas the source may not raise
> UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
> 
> 
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> 
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
> 
> This is incorrect if f(nonnull null) is UB. To make this correct, nonnull
> should be dropped. This becomes harder to fix if nonnull was attached at
> the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
> 
> Actually the D70749's thread had an end-to-end miscompilation example
due
> to the interaction between this DAE and the optimization 3 (insertion of
> llvm.assume).
> 
> Thanks,
> Juneyoung Lee
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/e4bba1d6/attachment.sig>

Hal Finkel via llvm-dev

2020-Feb-18 02:37 UTC

head link

[llvm-dev] The semantics of nonnull attribute

On 2/17/20 7:45 PM, Juneyoung Lee wrote:> Hello all,
>
> LangRef says it is undefined behavior to pass null to a nonnull 
> argument (`call f(nonnull null);`), but the semantics is too strong 
> for a few existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is 
> equivalent to `f(poison)`, but (A) it again blocks another set of 
> optimizations, and (B) this makes the semantics of nonnull deviate 
> from other attributes like dereferenceable.
> I found that there was a similar discussion about this issue in the 
> past as well, but seems it is not settled yet.
> What should the semantics of nonnull be?

I had been operating under the idea that it was UB. However, it probably 
should be poison. Do we care if the pointer is nonnull if it's never 
used? It's not clear to me that we do. It's not as though we somehow 
optimize the parameter passing itself based on the nonnull property.

  -Hal

> I listed a few optimizations that are relevant with this issue.
>
>
> 1. Propagating nonnull attribute to callee's arg 
> (https://godbolt.org/z/-cVsVP )
>
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
>
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull 
> null) should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is 
> equivalent to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless 
> of ptr. So we can't guarantee ptr != null at other uses of ptr.
>
>
> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
>
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
>
> This optimization is incorrect if `f(nonnull null)` is UB. The reason 
> is as follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` 
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, 
> the transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull 
> poison)` are equivalent to `f(poison)`.
>
>
> 3. https://reviews.llvm.org/D69477
>
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
>
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, 
> the added llvm.assume(ptr != null) raises UB whereas the source may 
> not raise UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
>
>
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
>
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
>
> This is incorrect if f(nonnull null) is UB. To make this correct, 
> nonnull should be dropped. This becomes harder to fix if nonnull was 
> attached at the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
>
> Actually the D70749's thread had an end-to-end miscompilation example 
> due to the interaction between this DAE and the optimization 3 
> (insertion of llvm.assume).
>
> Thanks,
> Juneyoung Lee
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Nuno Lopes via llvm-dev

2020-Feb-18 14:40 UTC

head link

[llvm-dev] The semantics of nonnull attribute

Not sure the semantics of "used" you propose is sufficient. AFAIU the
proposal, "used" could only be used in cases where the function will
always trigger UB if poison is passed as argument.
The semantics of attributes is usually the other way around, since function
calls need to have UB as strong as the worst behavior of the function. If a
function may for some reason trigger UB with a given set of arguments, then the
function call has to trigger UB as well.

Imagine something like:
Call f(null, 1)

int f(nonnull %ptr, int c) {
  if (c)
    deref %ptr
}

You can't use the "used" attribute, since UB is only conditional.
So we would say the return value of f(null, 1) is poison. But if you inline the
function you get UB. Inlining has to continue to be correct without exceptions.
Maybe I misunderstood the semantics of used, but if not I hope the example above
is sufficient to show it isn't sufficient.

Thanks,
Nuno

-----Original Message-----
From: Johannes Doerfert <jdoerfert at anl.gov> 
Sent: 18 de fevereiro de 2020 02:35
Subject: Re: [llvm-dev] The semantics of nonnull attribute

Hi Juneyoung,

thank you for starting this discussion again, the last one [0] did not
really end in a solution but the issue remains and will only become
worse, e.g., enable the Attributor and you might see all of the problems
below exposed in a single pass [1-4].

As I argued in [0], UB is problematic and I am strongly in favor of
poison. In some sense I see UB as a very "strong" guarantee and poison
as a fairly "week" one. We will have a hard time deriving the strong
one
and it might cause us to break things along the way, I mean your
examples show that we already miscompile codes.

I agree that poison is on its own not "strong enough" for certain
optimization, as you pointed out below, but we could reasonably augment
it with another attribute, let's call it `used`, that basically bridges
the gap:

  void @foo(nonnull);
  void @bar(nonnull used);

  foo(NULL)  // <- argument is poison inside of @foo
  bar(NULL)  // <- UB at the call site of @bar

A value is `used` if it would inevitably cause UB if it was poison at
this program point. I hope this makes some sense if not I can explain in
more detail.

Btw. I think this also integrates well with other attribute problems we
had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in this
example were "not used" (in some naive implementation of memcpy at
least).

Maybe I have overlooked a problem with this solution so let me know what
you think.

Cheers,
  Johannes

[0] https://reviews.llvm.org/D70749
[1] 1. below: https://godbolt.org/z/J4wFVw
[2] 2. below: https://godbolt.org/z/TAkSC6
[3] 3. below: https://godbolt.org/z/UqYi6S (doesn't work yet, will though)
[4] 4. below: https://godbolt.org/z/Qkkc_E
              https://godbolt.org/z/uXAfp_
              ...

On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all,
> 
> LangRef says it is undefined behavior to pass null to a nonnull argument
> (`call f(nonnull null);`), but the semantics is too strong for a few
> existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is
> equivalent to `f(poison)`, but (A) it again blocks another set of
> optimizations, and (B) this makes the semantics of nonnull deviate from
> other attributes like dereferenceable.
> I found that there was a similar discussion about this issue in the past as
> well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
> 
> 
> 1. Propagating nonnull attribute to callee's arg (
> https://godbolt.org/z/-cVsVP )
> 
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
> 
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null)
> should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is equivalent
> to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless of
> ptr. So we can't guarantee ptr != null at other uses of ptr.
For the given code snipped it is not even invalid to propagate nonnull
in the poison case but your point is not wrong, if we have f(poison) we
cannot derive anything from it if the argument might be unused.

> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> 
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
> 
> This optimization is incorrect if `f(nonnull null)` is UB. The reason is as
> follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the
> transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull
> poison)` are equivalent to `f(poison)`.
> 
> 
> 3. https://reviews.llvm.org/D69477
> 
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
> 
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the
> added llvm.assume(ptr != null) raises UB whereas the source may not raise
> UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
> 
> 
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> 
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
> 
> This is incorrect if f(nonnull null) is UB. To make this correct, nonnull
> should be dropped. This becomes harder to fix if nonnull was attached at
> the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
> 
> Actually the D70749's thread had an end-to-end miscompilation example
due
> to the interaction between this DAE and the optimization 3 (insertion of
> llvm.assume).
> 
> Thanks,
> Juneyoung Lee

Johannes Doerfert via llvm-dev

2020-Feb-18 15:20 UTC

head link

[llvm-dev] The semantics of nonnull attribute

On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all,
> 
> LangRef says it is undefined behavior to pass null to a nonnull argument
> (`call f(nonnull null);`), but the semantics is too strong for a few
> existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is
> equivalent to `f(poison)`, but (A) it again blocks another set of
> optimizations, and (B) this makes the semantics of nonnull deviate from
> other attributes like dereferenceable.
I forgot to explicit say that in my first email but whatever we decide
for `nonnull` needs to be applied to basically all other attributes as
well. `nonnul` is no way special (IMHO).

Cheers,
  Johannes

> I found that there was a similar discussion about this issue in the past as
> well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
> 
> 
> 1. Propagating nonnull attribute to callee's arg (
> https://godbolt.org/z/-cVsVP )
> 
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
> 
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null)
> should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is equivalent
> to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless of
> ptr. So we can't guarantee ptr != null at other uses of ptr.
> 
> 
> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> 
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
> 
> This optimization is incorrect if `f(nonnull null)` is UB. The reason is as
> follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the
> transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull
> poison)` are equivalent to `f(poison)`.
> 
> 
> 3. https://reviews.llvm.org/D69477
> 
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
> 
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the
> added llvm.assume(ptr != null) raises UB whereas the source may not raise
> UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
> 
> 
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> 
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
> 
> This is incorrect if f(nonnull null) is UB. To make this correct, nonnull
> should be dropped. This becomes harder to fix if nonnull was attached at
> the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
> 
> Actually the D70749's thread had an end-to-end miscompilation example
due
> to the interaction between this DAE and the optimization 3 (insertion of
> llvm.assume).
> 
> Thanks,
> Juneyoung Lee
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/71846a38/attachment.sig>

Nuno Lopes via llvm-dev

2020-Feb-18 18:22 UTC

head link

[llvm-dev] The semantics of nonnull attribute

> I forgot to explicit say that in my first email but whatever we decide for
`nonnull` needs to be applied to basically all other attributes as well.
`nonnul` is no way special (IMHO).
I think that not all attributes are the same. For example,
"dereferenceable(n)" is quite strong. It allows e.g.:
f(dereferenceable(4) %p) {
  loop() {
    %v = load %p
    use(%v)
  }
}
=>
f(dereferenceable(4) %p) {
  %v = load %p
  loop() {
    use(%v)
  }
}

For this transformation to be correct, the function call must trigger UB if the
pointer passed as argument is not dereferenceable. Otherwise, upon inlining, it
would become UB.
Other attributes are not as severe. Ideally we would have a consistent solution
so we don't need to remember which are the nice attributes.

Nuno

> I found that there was a similar discussion about this issue in the 
> past as well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
> 
> 
> 1. Propagating nonnull attribute to callee's arg ( 
> https://godbolt.org/z/-cVsVP )
> 
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
> 
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull 
> null) should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is 
> equivalent to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless 
> of ptr. So we can't guarantee ptr != null at other uses of ptr.
> 
> 
> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> 
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
> 
> This optimization is incorrect if `f(nonnull null)` is UB. The reason 
> is as follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` 
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, 
> the transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull 
> poison)` are equivalent to `f(poison)`.
> 
> 
> 3. https://reviews.llvm.org/D69477
> 
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
> 
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, 
> the added llvm.assume(ptr != null) raises UB whereas the source may 
> not raise UB at all. (e.g. assume that f() was an empty function) If 
> f(nonnull null) is UB, this is correct.
> 
> 
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> 
> f(nonnull ptr); // f’s argument is dead => f(nonnull undef);
> 
> This is incorrect if f(nonnull null) is UB. To make this correct, 
> nonnull should be dropped. This becomes harder to fix if nonnull was 
> attached at the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
> 
> Actually the D70749's thread had an end-to-end miscompilation example 
> due to the interaction between this DAE and the optimization 3 
> (insertion of llvm.assume).
> 
> Thanks,
> Juneyoung Lee
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov

Philip Reames via llvm-dev

2020-Feb-18 21:49 UTC

head link

[llvm-dev] The semantics of nonnull attribute

I'm really not sure I agree there's a problem with the current semantics
at all.

The root of your argument appears to be that passing poison to a 
function argument is "harmless" if that argument is unused in the 
callee.  This seems reasonable to me, but it brings with it a 
restriction on any IPO pass which is non-obvious.  It essentially 
requires that the IPA argument inference framework prove both that a 
property holds, and that if that property didn't hold UB would be 
triggered.  (I think this is similar in spirit to the "used"
discussion
downthread.)   In general, it is much easier to prove the former 
property than the later.

I would simply state that per the current semantics, any IPO/IPA 
transform which doesn't do so is buggy.  That's not pretty, but it's
valid and I believe matches (most of?) the current implementation.

Conversely, the power for the frontend to annotate such facts on the 
function boundary is one of the key advantages higher level semantics 
give over inference.  I would be very reluctant to give that up.

To be clear, I'm completely open to proposals to change the current 
semantics for ease of implementation on the IPO/IPA side. I just think 
it's important to distinguish between "current semantics are
wrong" and
"current semantics are inconvenient".

Philip

On 2/17/20 5:45 PM, Juneyoung Lee wrote:> Hello all,
>
> LangRef says it is undefined behavior to pass null to a nonnull 
> argument (`call f(nonnull null);`), but the semantics is too strong 
> for a few existing optimizations.
> To support these, we can relax the semantics so `f(nonnull null)` is 
> equivalent to `f(poison)`, but (A) it again blocks another set of 
> optimizations, and (B) this makes the semantics of nonnull deviate 
> from other attributes like dereferenceable.
> I found that there was a similar discussion about this issue in the 
> past as well, but seems it is not settled yet.
> What should the semantics of nonnull be?
> I listed a few optimizations that are relevant with this issue.
>
>
> 1. Propagating nonnull attribute to callee's arg 
> (https://godbolt.org/z/-cVsVP )
>
> g(i8* ptr) {
> f(nonnull ptr);
> }
> =>
> g(i8* nonnull ptr) {
> f(nonnull ptr);
> }
>
> This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull 
> null) should have raised UB, so ptr shouldn't be null.
> However, this transformation is incorrect if f(nonnull null) is 
> equivalent to f(poison).
> If f was an empty function, f(nonnull null) never raises UB regardless 
> of ptr. So we can't guarantee ptr != null at other uses of ptr.
>
>
> 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
>
> %ptr_inb = gep inbounds %any_ptr, 1
> f(%ptr_inb)
> =>
> %ptr_inb = .. (identical)
> f(nonnull %ptr_inb)
>
> This optimization is incorrect if `f(nonnull null)` is UB. The reason 
> is as follows.
> If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` 
> whereas the optimized one is `f(nonnull poison)`.
> `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, 
> the transformation introduced UB.
> This optimization is correct if both `f(nonnull null)` and `f(nonnull 
> poison)` are equivalent to `f(poison)`.
>
>
> 3. https://reviews.llvm.org/D69477
>
> f(nonnull ptr);
> use(ptr);
> =>
> llvm.assume(ptr != null);
> use(ptr);
> f(nonnull ptr);
>
> If f(nonnull null) is f(poison), this is incorrect. If ptr was null, 
> the added llvm.assume(ptr != null) raises UB whereas the source may 
> not raise UB at all. (e.g. assume that f() was an empty function)
> If f(nonnull null) is UB, this is correct.
>
>
> 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
>
> f(nonnull ptr); // f’s argument is dead
> =>
> f(nonnull undef);
>
> This is incorrect if f(nonnull null) is UB. To make this correct, 
> nonnull should be dropped. This becomes harder to fix if nonnull was 
> attached at the signature of a function (not at the callee site).
> It is correct if f(nonnull null) is f(poison).
>
> Actually the D70749's thread had an end-to-end miscompilation example 
> due to the interaction between this DAE and the optimization 3 
> (insertion of llvm.assume).
>
> Thanks,
> Juneyoung Lee

Johannes Doerfert via llvm-dev

2020-Feb-18 23:04 UTC

head link

[llvm-dev] The semantics of nonnull attribute

On 02/18, Philip Reames via llvm-dev wrote:> I'm really not sure I agree there's a problem with the current
semantics at
> all.
>
> The root of your argument appears to be that passing poison to a function
> argument is "harmless" if that argument is unused in the callee. 
This seems
> reasonable to me, but it brings with it a restriction on any IPO pass which
> is non-obvious.  It essentially requires that the IPA argument inference
> framework prove both that a property holds, and that if that property
didn't
> hold UB would be triggered.  (I think this is similar in spirit to the
> "used" discussion downthread.)   In general, it is much easier to
prove the
> former property than the later.
> 
> I would simply state that per the current semantics, any IPO/IPA transform
> which doesn't do so is buggy.  That's not pretty, but it's
valid and I
> believe matches (most of?) the current implementation.
There is a good example (below) that shows how we misinterpret this
already and I anticipate more as we improve our attribute deduction
capabilities.

> Conversely, the power for the frontend to annotate such facts on the
> function boundary is one of the key advantages higher level semantics give
> over inference.  I would be very reluctant to give that up.
You would not loose anything. As we discussed in the other part of the
thread:
  `attr` + `not_poison` (aka. `used`) => current UB semantics
All we get is a two step process which frontends can avoid by always
using the attributes together.

> To be clear, I'm completely open to proposals to change the current
> semantics for ease of implementation on the IPO/IPA side. I just think
it's
> important to distinguish between "current semantics are wrong"
and "current
> semantics are inconvenient".
Juneyoung lists nice examples of transformations that are "buggy"
according to the current semantics. I think one of the strongest
arguments for a change is his case 2. You can also write it as:

void @bar();
void @foo(nonnull %a, %offset) {
  %g = gep inbounds %a, %offset
  call void @bar(%g)             ; <- You cannot mark %g at the call
                                 ;    site as nonnull according to the
                                 ;    current lang-ref rules as a
                                 ;    violation of `inbounds` results
                                 ;    in a poison value and not UB.
                                 ;    Now, `call void @bar(nonnull %g)`
                                 ;    would introduce UB where there was
                                 ;    none. We still do that though,
                                 ;    e.g., in instcombine. The helper
                                 ;    `isKnownNonZero` will return true
                                 ;    as well, which actually means,
                                 ;    non-zero or poison.
}

Cheers,
  Johannes

> Philip
> 
> On 2/17/20 5:45 PM, Juneyoung Lee wrote:
> > Hello all,
> > 
> > LangRef says it is undefined behavior to pass null to a nonnull
argument
> > (`call f(nonnull null);`), but the semantics is too strong for a few
> > existing optimizations.
> > To support these, we can relax the semantics so `f(nonnull null)` is
> > equivalent to `f(poison)`, but (A) it again blocks another set of
> > optimizations, and (B) this makes the semantics of nonnull deviate
from
> > other attributes like dereferenceable.
> > I found that there was a similar discussion about this issue in the
past
> > as well, but seems it is not settled yet.
> > What should the semantics of nonnull be?
> > I listed a few optimizations that are relevant with this issue.
> > 
> > 
> > 1. Propagating nonnull attribute to callee's arg
> > (https://godbolt.org/z/-cVsVP )
> > 
> > g(i8* ptr) {
> > f(nonnull ptr);
> > }
> > =>
> > g(i8* nonnull ptr) {
> > f(nonnull ptr);
> > }
> > 
> > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull
> > null) should have raised UB, so ptr shouldn't be null.
> > However, this transformation is incorrect if f(nonnull null) is
> > equivalent to f(poison).
> > If f was an empty function, f(nonnull null) never raises UB regardless
> > of ptr. So we can't guarantee ptr != null at other uses of ptr.
> > 
> > 
> > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> > 
> > %ptr_inb = gep inbounds %any_ptr, 1
> > f(%ptr_inb)
> > =>
> > %ptr_inb = .. (identical)
> > f(nonnull %ptr_inb)
> > 
> > This optimization is incorrect if `f(nonnull null)` is UB. The reason
is
> > as follows.
> > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> > whereas the optimized one is `f(nonnull poison)`.
> > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So,
> > the transformation introduced UB.
> > This optimization is correct if both `f(nonnull null)` and `f(nonnull
> > poison)` are equivalent to `f(poison)`.
> > 
> > 
> > 3. https://reviews.llvm.org/D69477
> > 
> > f(nonnull ptr);
> > use(ptr);
> > =>
> > llvm.assume(ptr != null);
> > use(ptr);
> > f(nonnull ptr);
> > 
> > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,
the
> > added llvm.assume(ptr != null) raises UB whereas the source may not
> > raise UB at all. (e.g. assume that f() was an empty function)
> > If f(nonnull null) is UB, this is correct.
> > 
> > 
> > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> > 
> > f(nonnull ptr); // f’s argument is dead
> > =>
> > f(nonnull undef);
> > 
> > This is incorrect if f(nonnull null) is UB. To make this correct,
> > nonnull should be dropped. This becomes harder to fix if nonnull was
> > attached at the signature of a function (not at the callee site).
> > It is correct if f(nonnull null) is f(poison).
> > 
> > Actually the D70749's thread had an end-to-end miscompilation
example
> > due to the interaction between this DAE and the optimization 3
> > (insertion of llvm.assume).
> > 
> > Thanks,
> > Juneyoung Lee
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/31df7e1d/attachment.sig>

Juneyoung Lee via llvm-dev

2020-Feb-19 06:59 UTC

head link

[llvm-dev] The semantics of nonnull attribute

Hello Philip,

I agree that the current (UB) semantics is simpler in terms of
interprocedural analysis, .

My concern was that the semantics of nonnull was too strong (too undefined)
for optimizations that attach nonnull, such as the InstCombine
optimization. It seems there is a class of such transformations, such as
deriving nonnull from inttoptr(or x, 1).
If nonnull raises UB, attaching nonnull from this is problematic, as these
operations don't raise UB but propagates poison.

Or this can be supported by adding 'nonnull_or_poison',
Would this make sense? It wouldn't touch the existing semantics of nonnull.
Also, if an argument is not_poison and nonnull_or_poison, they can be
merged into nonnull.

Juneyoung Lee

On Wed, Feb 19, 2020 at 6:49 AM Philip Reames <listmail at
philipreames.com>
wrote:
> I'm really not sure I agree there's a problem with the current
semantics
> at all.
>
> The root of your argument appears to be that passing poison to a
> function argument is "harmless" if that argument is unused in the
> callee.  This seems reasonable to me, but it brings with it a
> restriction on any IPO pass which is non-obvious.  It essentially
> requires that the IPA argument inference framework prove both that a
> property holds, and that if that property didn't hold UB would be
> triggered.  (I think this is similar in spirit to the "used"
discussion
> downthread.)   In general, it is much easier to prove the former
> property than the later.
>
> I would simply state that per the current semantics, any IPO/IPA
> transform which doesn't do so is buggy.  That's not pretty, but
it's
> valid and I believe matches (most of?) the current implementation.
>
> Conversely, the power for the frontend to annotate such facts on the
> function boundary is one of the key advantages higher level semantics
> give over inference.  I would be very reluctant to give that up.
>
> To be clear, I'm completely open to proposals to change the current
> semantics for ease of implementation on the IPO/IPA side. I just think
> it's important to distinguish between "current semantics are
wrong" and
> "current semantics are inconvenient".
>
> Philip
>
> On 2/17/20 5:45 PM, Juneyoung Lee wrote:
> > Hello all,
> >
> > LangRef says it is undefined behavior to pass null to a nonnull
> > argument (`call f(nonnull null);`), but the semantics is too strong
> > for a few existing optimizations.
> > To support these, we can relax the semantics so `f(nonnull null)` is
> > equivalent to `f(poison)`, but (A) it again blocks another set of
> > optimizations, and (B) this makes the semantics of nonnull deviate
> > from other attributes like dereferenceable.
> > I found that there was a similar discussion about this issue in the
> > past as well, but seems it is not settled yet.
> > What should the semantics of nonnull be?
> > I listed a few optimizations that are relevant with this issue.
> >
> >
> > 1. Propagating nonnull attribute to callee's arg
> > (https://godbolt.org/z/-cVsVP )
> >
> > g(i8* ptr) {
> > f(nonnull ptr);
> > }
> > =>
> > g(i8* nonnull ptr) {
> > f(nonnull ptr);
> > }
> >
> > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull
> > null) should have raised UB, so ptr shouldn't be null.
> > However, this transformation is incorrect if f(nonnull null) is
> > equivalent to f(poison).
> > If f was an empty function, f(nonnull null) never raises UB regardless
> > of ptr. So we can't guarantee ptr != null at other uses of ptr.
> >
> >
> > 2. InstCombine (https://godbolt.org/z/HDQ7rD ):
> >
> > %ptr_inb = gep inbounds %any_ptr, 1
> > f(%ptr_inb)
> > =>
> > %ptr_inb = .. (identical)
> > f(nonnull %ptr_inb)
> >
> > This optimization is incorrect if `f(nonnull null)` is UB. The reason
> > is as follows.
> > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)`
> > whereas the optimized one is `f(nonnull poison)`.
> > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So,
> > the transformation introduced UB.
> > This optimization is correct if both `f(nonnull null)` and `f(nonnull
> > poison)` are equivalent to `f(poison)`.
> >
> >
> > 3. https://reviews.llvm.org/D69477
> >
> > f(nonnull ptr);
> > use(ptr);
> > =>
> > llvm.assume(ptr != null);
> > use(ptr);
> > f(nonnull ptr);
> >
> > If f(nonnull null) is f(poison), this is incorrect. If ptr was null,
> > the added llvm.assume(ptr != null) raises UB whereas the source may
> > not raise UB at all. (e.g. assume that f() was an empty function)
> > If f(nonnull null) is UB, this is correct.
> >
> >
> > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 )
> >
> > f(nonnull ptr); // f’s argument is dead
> > =>
> > f(nonnull undef);
> >
> > This is incorrect if f(nonnull null) is UB. To make this correct,
> > nonnull should be dropped. This becomes harder to fix if nonnull was
> > attached at the signature of a function (not at the callee site).
> > It is correct if f(nonnull null) is f(poison).
> >
> > Actually the D70749's thread had an end-to-end miscompilation
example
> > due to the interaction between this DAE and the optimization 3
> > (insertion of llvm.assume).
> >
> > Thanks,
> > Juneyoung Lee
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200219/1ce74277/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Feb 2020 - The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

[llvm-dev] The semantics of nonnull attribute

Reasonably Related Threads