Juneyoung Lee via llvm-dev
2020-Feb-18 01:45 UTC
[llvm-dev] The semantics of nonnull attribute
Hello all, LangRef says it is undefined behavior to pass null to a nonnull argument (`call f(nonnull null);`), but the semantics is too strong for a few existing optimizations. To support these, we can relax the semantics so `f(nonnull null)` is equivalent to `f(poison)`, but (A) it again blocks another set of optimizations, and (B) this makes the semantics of nonnull deviate from other attributes like dereferenceable. I found that there was a similar discussion about this issue in the past as well, but seems it is not settled yet. What should the semantics of nonnull be? I listed a few optimizations that are relevant with this issue. 1. Propagating nonnull attribute to callee's arg ( https://godbolt.org/z/-cVsVP ) g(i8* ptr) { f(nonnull ptr); } => g(i8* nonnull ptr) { f(nonnull ptr); } This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null) should have raised UB, so ptr shouldn't be null. However, this transformation is incorrect if f(nonnull null) is equivalent to f(poison). If f was an empty function, f(nonnull null) never raises UB regardless of ptr. So we can't guarantee ptr != null at other uses of ptr. 2. InstCombine (https://godbolt.org/z/HDQ7rD ): %ptr_inb = gep inbounds %any_ptr, 1 f(%ptr_inb) => %ptr_inb = .. (identical) f(nonnull %ptr_inb) This optimization is incorrect if `f(nonnull null)` is UB. The reason is as follows. If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` whereas the optimized one is `f(nonnull poison)`. `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the transformation introduced UB. This optimization is correct if both `f(nonnull null)` and `f(nonnull poison)` are equivalent to `f(poison)`. 3. https://reviews.llvm.org/D69477 f(nonnull ptr); use(ptr); => llvm.assume(ptr != null); use(ptr); f(nonnull ptr); If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the added llvm.assume(ptr != null) raises UB whereas the source may not raise UB at all. (e.g. assume that f() was an empty function) If f(nonnull null) is UB, this is correct. 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) f(nonnull ptr); // f’s argument is dead => f(nonnull undef); This is incorrect if f(nonnull null) is UB. To make this correct, nonnull should be dropped. This becomes harder to fix if nonnull was attached at the signature of a function (not at the callee site). It is correct if f(nonnull null) is f(poison). Actually the D70749's thread had an end-to-end miscompilation example due to the interaction between this DAE and the optimization 3 (insertion of llvm.assume). Thanks, Juneyoung Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/87d18a9f/attachment.html>
Johannes Doerfert via llvm-dev
2020-Feb-18 02:35 UTC
[llvm-dev] The semantics of nonnull attribute
Hi Juneyoung, thank you for starting this discussion again, the last one [0] did not really end in a solution but the issue remains and will only become worse, e.g., enable the Attributor and you might see all of the problems below exposed in a single pass [1-4]. As I argued in [0], UB is problematic and I am strongly in favor of poison. In some sense I see UB as a very "strong" guarantee and poison as a fairly "week" one. We will have a hard time deriving the strong one and it might cause us to break things along the way, I mean your examples show that we already miscompile codes. I agree that poison is on its own not "strong enough" for certain optimization, as you pointed out below, but we could reasonably augment it with another attribute, let's call it `used`, that basically bridges the gap: void @foo(nonnull); void @bar(nonnull used); foo(NULL) // <- argument is poison inside of @foo bar(NULL) // <- UB at the call site of @bar A value is `used` if it would inevitably cause UB if it was poison at this program point. I hope this makes some sense if not I can explain in more detail. Btw. I think this also integrates well with other attribute problems we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in this example were "not used" (in some naive implementation of memcpy at least). Maybe I have overlooked a problem with this solution so let me know what you think. Cheers, Johannes [0] https://reviews.llvm.org/D70749 [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: https://godbolt.org/z/TAkSC6 [3] 3. below: https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. below: https://godbolt.org/z/Qkkc_E https://godbolt.org/z/uXAfp_ ... On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all, > > LangRef says it is undefined behavior to pass null to a nonnull argument > (`call f(nonnull null);`), but the semantics is too strong for a few > existing optimizations. > To support these, we can relax the semantics so `f(nonnull null)` is > equivalent to `f(poison)`, but (A) it again blocks another set of > optimizations, and (B) this makes the semantics of nonnull deviate from > other attributes like dereferenceable. > I found that there was a similar discussion about this issue in the past as > well, but seems it is not settled yet. > What should the semantics of nonnull be? > I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg ( > https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null) > should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is equivalent > to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless of > ptr. So we can't guarantee ptr != null at other uses of ptr.For the given code snipped it is not even invalid to propagate nonnull in the poison case but your point is not wrong, if we have f(poison) we cannot derive anything from it if the argument might be unused.> 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason is as > follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the > transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the > added llvm.assume(ptr != null) raises UB whereas the source may not raise > UB at all. (e.g. assume that f() was an empty function) > If f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead > => > f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, nonnull > should be dropped. This becomes harder to fix if nonnull was attached at > the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example due > to the interaction between this DAE and the optimization 3 (insertion of > llvm.assume). > > Thanks, > Juneyoung Lee> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200217/e4bba1d6/attachment.sig>
Hal Finkel via llvm-dev
2020-Feb-18 02:37 UTC
[llvm-dev] The semantics of nonnull attribute
On 2/17/20 7:45 PM, Juneyoung Lee wrote:> Hello all, > > LangRef says it is undefined behavior to pass null to a nonnull > argument (`call f(nonnull null);`), but the semantics is too strong > for a few existing optimizations. > To support these, we can relax the semantics so `f(nonnull null)` is > equivalent to `f(poison)`, but (A) it again blocks another set of > optimizations, and (B) this makes the semantics of nonnull deviate > from other attributes like dereferenceable. > I found that there was a similar discussion about this issue in the > past as well, but seems it is not settled yet. > What should the semantics of nonnull be?I had been operating under the idea that it was UB. However, it probably should be poison. Do we care if the pointer is nonnull if it's never used? It's not clear to me that we do. It's not as though we somehow optimize the parameter passing itself based on the nonnull property. -Hal> I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg > (https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > null) should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is > equivalent to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless > of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason > is as follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, > the transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > the added llvm.assume(ptr != null) raises UB whereas the source may > not raise UB at all. (e.g. assume that f() was an empty function) > If f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead > => > f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, > nonnull should be dropped. This becomes harder to fix if nonnull was > attached at the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example > due to the interaction between this DAE and the optimization 3 > (insertion of llvm.assume). > > Thanks, > Juneyoung Lee-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Nuno Lopes via llvm-dev
2020-Feb-18 14:40 UTC
[llvm-dev] The semantics of nonnull attribute
Not sure the semantics of "used" you propose is sufficient. AFAIU the proposal, "used" could only be used in cases where the function will always trigger UB if poison is passed as argument. The semantics of attributes is usually the other way around, since function calls need to have UB as strong as the worst behavior of the function. If a function may for some reason trigger UB with a given set of arguments, then the function call has to trigger UB as well. Imagine something like: Call f(null, 1) int f(nonnull %ptr, int c) { if (c) deref %ptr } You can't use the "used" attribute, since UB is only conditional. So we would say the return value of f(null, 1) is poison. But if you inline the function you get UB. Inlining has to continue to be correct without exceptions. Maybe I misunderstood the semantics of used, but if not I hope the example above is sufficient to show it isn't sufficient. Thanks, Nuno -----Original Message----- From: Johannes Doerfert <jdoerfert at anl.gov> Sent: 18 de fevereiro de 2020 02:35 Subject: Re: [llvm-dev] The semantics of nonnull attribute Hi Juneyoung, thank you for starting this discussion again, the last one [0] did not really end in a solution but the issue remains and will only become worse, e.g., enable the Attributor and you might see all of the problems below exposed in a single pass [1-4]. As I argued in [0], UB is problematic and I am strongly in favor of poison. In some sense I see UB as a very "strong" guarantee and poison as a fairly "week" one. We will have a hard time deriving the strong one and it might cause us to break things along the way, I mean your examples show that we already miscompile codes. I agree that poison is on its own not "strong enough" for certain optimization, as you pointed out below, but we could reasonably augment it with another attribute, let's call it `used`, that basically bridges the gap: void @foo(nonnull); void @bar(nonnull used); foo(NULL) // <- argument is poison inside of @foo bar(NULL) // <- UB at the call site of @bar A value is `used` if it would inevitably cause UB if it was poison at this program point. I hope this makes some sense if not I can explain in more detail. Btw. I think this also integrates well with other attribute problems we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in this example were "not used" (in some naive implementation of memcpy at least). Maybe I have overlooked a problem with this solution so let me know what you think. Cheers, Johannes [0] https://reviews.llvm.org/D70749 [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: https://godbolt.org/z/TAkSC6 [3] 3. below: https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. below: https://godbolt.org/z/Qkkc_E https://godbolt.org/z/uXAfp_ ... On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all, > > LangRef says it is undefined behavior to pass null to a nonnull argument > (`call f(nonnull null);`), but the semantics is too strong for a few > existing optimizations. > To support these, we can relax the semantics so `f(nonnull null)` is > equivalent to `f(poison)`, but (A) it again blocks another set of > optimizations, and (B) this makes the semantics of nonnull deviate from > other attributes like dereferenceable. > I found that there was a similar discussion about this issue in the past as > well, but seems it is not settled yet. > What should the semantics of nonnull be? > I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg ( > https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null) > should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is equivalent > to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless of > ptr. So we can't guarantee ptr != null at other uses of ptr.For the given code snipped it is not even invalid to propagate nonnull in the poison case but your point is not wrong, if we have f(poison) we cannot derive anything from it if the argument might be unused.> 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason is as > follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the > transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the > added llvm.assume(ptr != null) raises UB whereas the source may not raise > UB at all. (e.g. assume that f() was an empty function) > If f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead > => > f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, nonnull > should be dropped. This becomes harder to fix if nonnull was attached at > the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example due > to the interaction between this DAE and the optimization 3 (insertion of > llvm.assume). > > Thanks, > Juneyoung Lee
Johannes Doerfert via llvm-dev
2020-Feb-18 15:20 UTC
[llvm-dev] The semantics of nonnull attribute
On 02/18, Juneyoung Lee via llvm-dev wrote:> Hello all, > > LangRef says it is undefined behavior to pass null to a nonnull argument > (`call f(nonnull null);`), but the semantics is too strong for a few > existing optimizations. > To support these, we can relax the semantics so `f(nonnull null)` is > equivalent to `f(poison)`, but (A) it again blocks another set of > optimizations, and (B) this makes the semantics of nonnull deviate from > other attributes like dereferenceable.I forgot to explicit say that in my first email but whatever we decide for `nonnull` needs to be applied to basically all other attributes as well. `nonnul` is no way special (IMHO). Cheers, Johannes> I found that there was a similar discussion about this issue in the past as > well, but seems it is not settled yet. > What should the semantics of nonnull be? > I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg ( > https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull null) > should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is equivalent > to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless of > ptr. So we can't guarantee ptr != null at other uses of ptr. > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason is as > follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, the > transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the > added llvm.assume(ptr != null) raises UB whereas the source may not raise > UB at all. (e.g. assume that f() was an empty function) > If f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead > => > f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, nonnull > should be dropped. This becomes harder to fix if nonnull was attached at > the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example due > to the interaction between this DAE and the optimization 3 (insertion of > llvm.assume). > > Thanks, > Juneyoung Lee> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/71846a38/attachment.sig>
Nuno Lopes via llvm-dev
2020-Feb-18 18:22 UTC
[llvm-dev] The semantics of nonnull attribute
> I forgot to explicit say that in my first email but whatever we decide for `nonnull` needs to be applied to basically all other attributes as well. `nonnul` is no way special (IMHO).I think that not all attributes are the same. For example, "dereferenceable(n)" is quite strong. It allows e.g.: f(dereferenceable(4) %p) { loop() { %v = load %p use(%v) } } => f(dereferenceable(4) %p) { %v = load %p loop() { use(%v) } } For this transformation to be correct, the function call must trigger UB if the pointer passed as argument is not dereferenceable. Otherwise, upon inlining, it would become UB. Other attributes are not as severe. Ideally we would have a consistent solution so we don't need to remember which are the nice attributes. Nuno> I found that there was a similar discussion about this issue in the > past as well, but seems it is not settled yet. > What should the semantics of nonnull be? > I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg ( > https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > null) should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is > equivalent to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless > of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason > is as follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, > the transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > the added llvm.assume(ptr != null) raises UB whereas the source may > not raise UB at all. (e.g. assume that f() was an empty function) If > f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead => f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, > nonnull should be dropped. This becomes harder to fix if nonnull was > attached at the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example > due to the interaction between this DAE and the optimization 3 > (insertion of llvm.assume). > > Thanks, > Juneyoung Lee> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov
Philip Reames via llvm-dev
2020-Feb-18 21:49 UTC
[llvm-dev] The semantics of nonnull attribute
I'm really not sure I agree there's a problem with the current semantics at all. The root of your argument appears to be that passing poison to a function argument is "harmless" if that argument is unused in the callee. This seems reasonable to me, but it brings with it a restriction on any IPO pass which is non-obvious. It essentially requires that the IPA argument inference framework prove both that a property holds, and that if that property didn't hold UB would be triggered. (I think this is similar in spirit to the "used" discussion downthread.) In general, it is much easier to prove the former property than the later. I would simply state that per the current semantics, any IPO/IPA transform which doesn't do so is buggy. That's not pretty, but it's valid and I believe matches (most of?) the current implementation. Conversely, the power for the frontend to annotate such facts on the function boundary is one of the key advantages higher level semantics give over inference. I would be very reluctant to give that up. To be clear, I'm completely open to proposals to change the current semantics for ease of implementation on the IPO/IPA side. I just think it's important to distinguish between "current semantics are wrong" and "current semantics are inconvenient". Philip On 2/17/20 5:45 PM, Juneyoung Lee wrote:> Hello all, > > LangRef says it is undefined behavior to pass null to a nonnull > argument (`call f(nonnull null);`), but the semantics is too strong > for a few existing optimizations. > To support these, we can relax the semantics so `f(nonnull null)` is > equivalent to `f(poison)`, but (A) it again blocks another set of > optimizations, and (B) this makes the semantics of nonnull deviate > from other attributes like dereferenceable. > I found that there was a similar discussion about this issue in the > past as well, but seems it is not settled yet. > What should the semantics of nonnull be? > I listed a few optimizations that are relevant with this issue. > > > 1. Propagating nonnull attribute to callee's arg > (https://godbolt.org/z/-cVsVP ) > > g(i8* ptr) { > f(nonnull ptr); > } > => > g(i8* nonnull ptr) { > f(nonnull ptr); > } > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > null) should have raised UB, so ptr shouldn't be null. > However, this transformation is incorrect if f(nonnull null) is > equivalent to f(poison). > If f was an empty function, f(nonnull null) never raises UB regardless > of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > %ptr_inb = gep inbounds %any_ptr, 1 > f(%ptr_inb) > => > %ptr_inb = .. (identical) > f(nonnull %ptr_inb) > > This optimization is incorrect if `f(nonnull null)` is UB. The reason > is as follows. > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > whereas the optimized one is `f(nonnull poison)`. > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, > the transformation introduced UB. > This optimization is correct if both `f(nonnull null)` and `f(nonnull > poison)` are equivalent to `f(poison)`. > > > 3. https://reviews.llvm.org/D69477 > > f(nonnull ptr); > use(ptr); > => > llvm.assume(ptr != null); > use(ptr); > f(nonnull ptr); > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > the added llvm.assume(ptr != null) raises UB whereas the source may > not raise UB at all. (e.g. assume that f() was an empty function) > If f(nonnull null) is UB, this is correct. > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > f(nonnull ptr); // f’s argument is dead > => > f(nonnull undef); > > This is incorrect if f(nonnull null) is UB. To make this correct, > nonnull should be dropped. This becomes harder to fix if nonnull was > attached at the signature of a function (not at the callee site). > It is correct if f(nonnull null) is f(poison). > > Actually the D70749's thread had an end-to-end miscompilation example > due to the interaction between this DAE and the optimization 3 > (insertion of llvm.assume). > > Thanks, > Juneyoung Lee
Johannes Doerfert via llvm-dev
2020-Feb-18 23:04 UTC
[llvm-dev] The semantics of nonnull attribute
On 02/18, Philip Reames via llvm-dev wrote:> I'm really not sure I agree there's a problem with the current semantics at > all. > > The root of your argument appears to be that passing poison to a function > argument is "harmless" if that argument is unused in the callee. This seems > reasonable to me, but it brings with it a restriction on any IPO pass which > is non-obvious. It essentially requires that the IPA argument inference > framework prove both that a property holds, and that if that property didn't > hold UB would be triggered. (I think this is similar in spirit to the > "used" discussion downthread.) In general, it is much easier to prove the > former property than the later. > > I would simply state that per the current semantics, any IPO/IPA transform > which doesn't do so is buggy. That's not pretty, but it's valid and I > believe matches (most of?) the current implementation.There is a good example (below) that shows how we misinterpret this already and I anticipate more as we improve our attribute deduction capabilities.> Conversely, the power for the frontend to annotate such facts on the > function boundary is one of the key advantages higher level semantics give > over inference. I would be very reluctant to give that up.You would not loose anything. As we discussed in the other part of the thread: `attr` + `not_poison` (aka. `used`) => current UB semantics All we get is a two step process which frontends can avoid by always using the attributes together.> To be clear, I'm completely open to proposals to change the current > semantics for ease of implementation on the IPO/IPA side. I just think it's > important to distinguish between "current semantics are wrong" and "current > semantics are inconvenient".Juneyoung lists nice examples of transformations that are "buggy" according to the current semantics. I think one of the strongest arguments for a change is his case 2. You can also write it as: void @bar(); void @foo(nonnull %a, %offset) { %g = gep inbounds %a, %offset call void @bar(%g) ; <- You cannot mark %g at the call ; site as nonnull according to the ; current lang-ref rules as a ; violation of `inbounds` results ; in a poison value and not UB. ; Now, `call void @bar(nonnull %g)` ; would introduce UB where there was ; none. We still do that though, ; e.g., in instcombine. The helper ; `isKnownNonZero` will return true ; as well, which actually means, ; non-zero or poison. } Cheers, Johannes> Philip > > On 2/17/20 5:45 PM, Juneyoung Lee wrote: > > Hello all, > > > > LangRef says it is undefined behavior to pass null to a nonnull argument > > (`call f(nonnull null);`), but the semantics is too strong for a few > > existing optimizations. > > To support these, we can relax the semantics so `f(nonnull null)` is > > equivalent to `f(poison)`, but (A) it again blocks another set of > > optimizations, and (B) this makes the semantics of nonnull deviate from > > other attributes like dereferenceable. > > I found that there was a similar discussion about this issue in the past > > as well, but seems it is not settled yet. > > What should the semantics of nonnull be? > > I listed a few optimizations that are relevant with this issue. > > > > > > 1. Propagating nonnull attribute to callee's arg > > (https://godbolt.org/z/-cVsVP ) > > > > g(i8* ptr) { > > f(nonnull ptr); > > } > > => > > g(i8* nonnull ptr) { > > f(nonnull ptr); > > } > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > null) should have raised UB, so ptr shouldn't be null. > > However, this transformation is incorrect if f(nonnull null) is > > equivalent to f(poison). > > If f was an empty function, f(nonnull null) never raises UB regardless > > of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > f(%ptr_inb) > > => > > %ptr_inb = .. (identical) > > f(nonnull %ptr_inb) > > > > This optimization is incorrect if `f(nonnull null)` is UB. The reason is > > as follows. > > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > > whereas the optimized one is `f(nonnull poison)`. > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, > > the transformation introduced UB. > > This optimization is correct if both `f(nonnull null)` and `f(nonnull > > poison)` are equivalent to `f(poison)`. > > > > > > 3. https://reviews.llvm.org/D69477 > > > > f(nonnull ptr); > > use(ptr); > > => > > llvm.assume(ptr != null); > > use(ptr); > > f(nonnull ptr); > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, the > > added llvm.assume(ptr != null) raises UB whereas the source may not > > raise UB at all. (e.g. assume that f() was an empty function) > > If f(nonnull null) is UB, this is correct. > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > f(nonnull ptr); // f’s argument is dead > > => > > f(nonnull undef); > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > nonnull should be dropped. This becomes harder to fix if nonnull was > > attached at the signature of a function (not at the callee site). > > It is correct if f(nonnull null) is f(poison). > > > > Actually the D70749's thread had an end-to-end miscompilation example > > due to the interaction between this DAE and the optimization 3 > > (insertion of llvm.assume). > > > > Thanks, > > Juneyoung Lee > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200218/31df7e1d/attachment.sig>
Juneyoung Lee via llvm-dev
2020-Feb-19 06:59 UTC
[llvm-dev] The semantics of nonnull attribute
Hello Philip, I agree that the current (UB) semantics is simpler in terms of interprocedural analysis, . My concern was that the semantics of nonnull was too strong (too undefined) for optimizations that attach nonnull, such as the InstCombine optimization. It seems there is a class of such transformations, such as deriving nonnull from inttoptr(or x, 1). If nonnull raises UB, attaching nonnull from this is problematic, as these operations don't raise UB but propagates poison. Or this can be supported by adding 'nonnull_or_poison', Would this make sense? It wouldn't touch the existing semantics of nonnull. Also, if an argument is not_poison and nonnull_or_poison, they can be merged into nonnull. Juneyoung Lee On Wed, Feb 19, 2020 at 6:49 AM Philip Reames <listmail at philipreames.com> wrote:> I'm really not sure I agree there's a problem with the current semantics > at all. > > The root of your argument appears to be that passing poison to a > function argument is "harmless" if that argument is unused in the > callee. This seems reasonable to me, but it brings with it a > restriction on any IPO pass which is non-obvious. It essentially > requires that the IPA argument inference framework prove both that a > property holds, and that if that property didn't hold UB would be > triggered. (I think this is similar in spirit to the "used" discussion > downthread.) In general, it is much easier to prove the former > property than the later. > > I would simply state that per the current semantics, any IPO/IPA > transform which doesn't do so is buggy. That's not pretty, but it's > valid and I believe matches (most of?) the current implementation. > > Conversely, the power for the frontend to annotate such facts on the > function boundary is one of the key advantages higher level semantics > give over inference. I would be very reluctant to give that up. > > To be clear, I'm completely open to proposals to change the current > semantics for ease of implementation on the IPO/IPA side. I just think > it's important to distinguish between "current semantics are wrong" and > "current semantics are inconvenient". > > Philip > > On 2/17/20 5:45 PM, Juneyoung Lee wrote: > > Hello all, > > > > LangRef says it is undefined behavior to pass null to a nonnull > > argument (`call f(nonnull null);`), but the semantics is too strong > > for a few existing optimizations. > > To support these, we can relax the semantics so `f(nonnull null)` is > > equivalent to `f(poison)`, but (A) it again blocks another set of > > optimizations, and (B) this makes the semantics of nonnull deviate > > from other attributes like dereferenceable. > > I found that there was a similar discussion about this issue in the > > past as well, but seems it is not settled yet. > > What should the semantics of nonnull be? > > I listed a few optimizations that are relevant with this issue. > > > > > > 1. Propagating nonnull attribute to callee's arg > > (https://godbolt.org/z/-cVsVP ) > > > > g(i8* ptr) { > > f(nonnull ptr); > > } > > => > > g(i8* nonnull ptr) { > > f(nonnull ptr); > > } > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > null) should have raised UB, so ptr shouldn't be null. > > However, this transformation is incorrect if f(nonnull null) is > > equivalent to f(poison). > > If f was an empty function, f(nonnull null) never raises UB regardless > > of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > f(%ptr_inb) > > => > > %ptr_inb = .. (identical) > > f(nonnull %ptr_inb) > > > > This optimization is incorrect if `f(nonnull null)` is UB. The reason > > is as follows. > > If `gep inbounds %any_ptr, 1` yields poison, the source is `f(poison)` > > whereas the optimized one is `f(nonnull poison)`. > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. So, > > the transformation introduced UB. > > This optimization is correct if both `f(nonnull null)` and `f(nonnull > > poison)` are equivalent to `f(poison)`. > > > > > > 3. https://reviews.llvm.org/D69477 > > > > f(nonnull ptr); > > use(ptr); > > => > > llvm.assume(ptr != null); > > use(ptr); > > f(nonnull ptr); > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > > the added llvm.assume(ptr != null) raises UB whereas the source may > > not raise UB at all. (e.g. assume that f() was an empty function) > > If f(nonnull null) is UB, this is correct. > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > f(nonnull ptr); // f’s argument is dead > > => > > f(nonnull undef); > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > nonnull should be dropped. This becomes harder to fix if nonnull was > > attached at the signature of a function (not at the callee site). > > It is correct if f(nonnull null) is f(poison). > > > > Actually the D70749's thread had an end-to-end miscompilation example > > due to the interaction between this DAE and the optimization 3 > > (insertion of llvm.assume). > > > > Thanks, > > Juneyoung Lee >-- Juneyoung Lee Software Foundation Lab, Seoul National University -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200219/1ce74277/attachment.html>