Nuno Lopes via llvm-dev
2020-Feb-18 18:29 UTC
[llvm-dev] The semantics of nonnull attribute
Hi Johannes,>> Not sure the semantics of "used" you propose is sufficient. AFAIU the >> proposal, "used" could only be used in cases where the function will >> always trigger UB if poison is passed as argument. The semantics of >> attributes is usually the other way around, since function calls need >> to have UB as strong as the worst behavior of the function. If a >> function may for some reason trigger UB with a given set of arguments, >> then the function call has to trigger UB as well. > >"Used", as other attributes are "best effort". We can drop them any time, so having "insufficient information" is not a correctness problem but an optimization problem. I mean, even if the function always causes UB if a `null` argument is given we do not need to know/annotate/exploit that. In fact, we can, and do, throw away that information, e.g., by removing the instruction known to cause UB which effectively defines some semantic for the UB program (which is then "w/o UB"). Long story short, I don't think "as strong as the worst behavior" is plausible (nor possible TBH).I agree with the motivation, but not with the exact semantics. The reason is that it seems that removing "used" wouldn't be possible. This is because a function with "used" gives poison, and without gives UB. It should the other way around, such that optimizers can freely remove "used". To make it clear, what I understood was: - %v = call @fn(nonnull null) -- %v == poison - %v = call @fn(nonnull used null) -- UB Even if the function is just a load of the pointer argument. Nuno> Imagine something like: > Call f(null, 1) > > int f(nonnull %ptr, int c) { > if (c) > deref %ptr > } > > You can't use the "used" attribute, since UB is only conditional.Agreed, for the callee (f) you cannot annotate "used". You can place "used" on the call site though*. And you can place it on the callee (f) if it was internal or you copy it to make it internal. So the optimization potential is still there, even if it is a little trickier. * FWIW, one-level of selective context sensitive propagaton is planned for the Attributor.> we would say the return value of f(null, 1) is poison. But if you > inline the function you get UB. Inlining has to continue to be correct > without exceptions.The attributes will just never represent the possible UB exhaustively. I mean, inlining is still correct and the UB that it will expose, together with SCCP, was there all along. With one-level of context sensitivity we could have even known, without we just didn't know but that is fine too.> ...[ moved to the front ] I hope this makes some sense. Cheers, Johannes P.S. I think I'll write "used" deduction later so we can take a look at how this would look like.> Thanks, > Nuno > > > -----Original Message----- > From: Johannes Doerfert <jdoerfert at anl.gov> > Sent: 18 de fevereiro de 2020 02:35 > Subject: Re: [llvm-dev] The semantics of nonnull attribute > > Hi Juneyoung, > > thank you for starting this discussion again, the last one [0] did not > really end in a solution but the issue remains and will only become > worse, e.g., enable the Attributor and you might see all of the > problems below exposed in a single pass [1-4]. > > As I argued in [0], UB is problematic and I am strongly in favor of > poison. In some sense I see UB as a very "strong" guarantee and poison > as a fairly "week" one. We will have a hard time deriving the strong > one and it might cause us to break things along the way, I mean your > examples show that we already miscompile codes. > > I agree that poison is on its own not "strong enough" for certain > optimization, as you pointed out below, but we could reasonably > augment it with another attribute, let's call it `used`, that > basically bridges the gap: > > void @foo(nonnull); > void @bar(nonnull used); > > foo(NULL) // <- argument is poison inside of @foo > bar(NULL) // <- UB at the call site of @bar > > A value is `used` if it would inevitably cause UB if it was poison at > this program point. I hope this makes some sense if not I can explain > in more detail. > > Btw. I think this also integrates well with other attribute problems > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in > this example were "not used" (in some naive implementation of memcpy > at least). > > Maybe I have overlooked a problem with this solution so let me know > what you think. > > Cheers, > Johannes > > > [0] https://reviews.llvm.org/D70749 > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: > https://godbolt.org/z/TAkSC6 [3] 3. below: > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. > below: https://godbolt.org/z/Qkkc_E > https://godbolt.org/z/uXAfp_ > ... > > On 02/18, Juneyoung Lee via llvm-dev wrote: > > Hello all, > > > > LangRef says it is undefined behavior to pass null to a nonnull > > argument (`call f(nonnull null);`), but the semantics is too strong > > for a few existing optimizations. > > To support these, we can relax the semantics so `f(nonnull null)` is > > equivalent to `f(poison)`, but (A) it again blocks another set of > > optimizations, and (B) this makes the semantics of nonnull deviate > > from other attributes like dereferenceable. > > I found that there was a similar discussion about this issue in the > > past as well, but seems it is not settled yet. > > What should the semantics of nonnull be? > > I listed a few optimizations that are relevant with this issue. > > > > > > 1. Propagating nonnull attribute to callee's arg ( > > https://godbolt.org/z/-cVsVP ) > > > > g(i8* ptr) { > > f(nonnull ptr); > > } > > => > > g(i8* nonnull ptr) { > > f(nonnull ptr); > > } > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > null) should have raised UB, so ptr shouldn't be null. > > However, this transformation is incorrect if f(nonnull null) is > > equivalent to f(poison). > > If f was an empty function, f(nonnull null) never raises UB > > regardless of ptr. So we can't guarantee ptr != null at other uses of ptr. > > For the given code snipped it is not even invalid to propagate nonnull > in the poison case but your point is not wrong, if we have f(poison) > we cannot derive anything from it if the argument might be unused. > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > f(%ptr_inb) > > => > > %ptr_inb = .. (identical) > > f(nonnull %ptr_inb) > > > > This optimization is incorrect if `f(nonnull null)` is UB. The > > reason is as follows. > > If `gep inbounds %any_ptr, 1` yields poison, the source is > > `f(poison)` whereas the optimized one is `f(nonnull poison)`. > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. > > So, the transformation introduced UB. > > This optimization is correct if both `f(nonnull null)` and > > `f(nonnull poison)` are equivalent to `f(poison)`. > > > > > > 3. https://reviews.llvm.org/D69477 > > > > f(nonnull ptr); > > use(ptr); > > => > > llvm.assume(ptr != null); > > use(ptr); > > f(nonnull ptr); > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > > the added llvm.assume(ptr != null) raises UB whereas the source may > > not raise UB at all. (e.g. assume that f() was an empty function) If > > f(nonnull null) is UB, this is correct. > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > f(nonnull ptr); // f’s argument is dead => f(nonnull undef); > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > nonnull should be dropped. This becomes harder to fix if nonnull was > > attached at the signature of a function (not at the callee site). > > It is correct if f(nonnull null) is f(poison). > > > > Actually the D70749's thread had an end-to-end miscompilation > > example due to the interaction between this DAE and the optimization > > 3 (insertion of llvm.assume). > > > > Thanks, > > Juneyoung Lee >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov
Doerfert, Johannes via llvm-dev
2020-Feb-18 20:13 UTC
[llvm-dev] The semantics of nonnull attribute
I think we are actually in agreement. "Used" should be removable, it would potentially define UB but that is valid and happening anyway everywhere. Maybe my initial mail might have been confusing but I tried to show that with this snippet:> void @foo(nonnull); > void @bar(nonnull used); > > > foo(NULL) // <- argument is poison inside of @foo > > bar(NULL) // <- UB at the call site of @barIf we remove `used`, thus go from @bar to @foo, we get a poison argument instead of UB, which is fine. WDYT?> Even if the function is just a load of the pointer argument.The Attributor does infer `nonnull` and `dereferenceable` in this case already so `used` would follow the same logic (+ look for uses in branches etc) FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to keep the UB behavior can just add `used` as well. ________________________________________ From: Nuno Lopes <nuno.lopes at ist.utl.pt> Sent: Tuesday, February 18, 2020 12:29 To: Doerfert, Johannes Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; llvm-dev at redking.me.uk; 'Eli Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John Regehr' Subject: RE: [llvm-dev] The semantics of nonnull attribute Hi Johannes,>> Not sure the semantics of "used" you propose is sufficient. AFAIU the >> proposal, "used" could only be used in cases where the function will >> always trigger UB if poison is passed as argument. The semantics of >> attributes is usually the other way around, since function calls need >> to have UB as strong as the worst behavior of the function. If a >> function may for some reason trigger UB with a given set of arguments, >> then the function call has to trigger UB as well. > >"Used", as other attributes are "best effort". We can drop them any time, so having "insufficient information" is not a correctness problem but an optimization problem. I mean, even if the function always causes UB if a `null` argument is given we do not need to know/annotate/exploit that. In fact, we can, and do, throw away that information, e.g., by removing the instruction known to cause UB which effectively defines some semantic for the UB program (which is then "w/o UB"). Long story short, I don't think "as strong as the worst behavior" is plausible (nor possible TBH).I agree with the motivation, but not with the exact semantics. The reason is that it seems that removing "used" wouldn't be possible. This is because a function with "used" gives poison, and without gives UB. It should the other way around, such that optimizers can freely remove "used". To make it clear, what I understood was: - %v = call @fn(nonnull null) -- %v == poison - %v = call @fn(nonnull used null) -- UB Even if the function is just a load of the pointer argument. Nuno> Imagine something like: > Call f(null, 1) > > int f(nonnull %ptr, int c) { > if (c) > deref %ptr > } > > You can't use the "used" attribute, since UB is only conditional.Agreed, for the callee (f) you cannot annotate "used". You can place "used" on the call site though*. And you can place it on the callee (f) if it was internal or you copy it to make it internal. So the optimization potential is still there, even if it is a little trickier. * FWIW, one-level of selective context sensitive propagaton is planned for the Attributor.> we would say the return value of f(null, 1) is poison. But if you > inline the function you get UB. Inlining has to continue to be correct > without exceptions.The attributes will just never represent the possible UB exhaustively. I mean, inlining is still correct and the UB that it will expose, together with SCCP, was there all along. With one-level of context sensitivity we could have even known, without we just didn't know but that is fine too.> ...[ moved to the front ] I hope this makes some sense. Cheers, Johannes P.S. I think I'll write "used" deduction later so we can take a look at how this would look like.> Thanks, > Nuno > > > -----Original Message----- > From: Johannes Doerfert <jdoerfert at anl.gov> > Sent: 18 de fevereiro de 2020 02:35 > Subject: Re: [llvm-dev] The semantics of nonnull attribute > > Hi Juneyoung, > > thank you for starting this discussion again, the last one [0] did not > really end in a solution but the issue remains and will only become > worse, e.g., enable the Attributor and you might see all of the > problems below exposed in a single pass [1-4]. > > As I argued in [0], UB is problematic and I am strongly in favor of > poison. In some sense I see UB as a very "strong" guarantee and poison > as a fairly "week" one. We will have a hard time deriving the strong > one and it might cause us to break things along the way, I mean your > examples show that we already miscompile codes. > > I agree that poison is on its own not "strong enough" for certain > optimization, as you pointed out below, but we could reasonably > augment it with another attribute, let's call it `used`, that > basically bridges the gap: > > void @foo(nonnull); > void @bar(nonnull used); > > foo(NULL) // <- argument is poison inside of @foo > bar(NULL) // <- UB at the call site of @bar > > A value is `used` if it would inevitably cause UB if it was poison at > this program point. I hope this makes some sense if not I can explain > in more detail. > > Btw. I think this also integrates well with other attribute problems > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in > this example were "not used" (in some naive implementation of memcpy > at least). > > Maybe I have overlooked a problem with this solution so let me know > what you think. > > Cheers, > Johannes > > > [0] https://reviews.llvm.org/D70749 > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: > https://godbolt.org/z/TAkSC6 [3] 3. below: > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. > below: https://godbolt.org/z/Qkkc_E > https://godbolt.org/z/uXAfp_ > ... > > On 02/18, Juneyoung Lee via llvm-dev wrote: > > Hello all, > > > > LangRef says it is undefined behavior to pass null to a nonnull > > argument (`call f(nonnull null);`), but the semantics is too strong > > for a few existing optimizations. > > To support these, we can relax the semantics so `f(nonnull null)` is > > equivalent to `f(poison)`, but (A) it again blocks another set of > > optimizations, and (B) this makes the semantics of nonnull deviate > > from other attributes like dereferenceable. > > I found that there was a similar discussion about this issue in the > > past as well, but seems it is not settled yet. > > What should the semantics of nonnull be? > > I listed a few optimizations that are relevant with this issue. > > > > > > 1. Propagating nonnull attribute to callee's arg ( > > https://godbolt.org/z/-cVsVP ) > > > > g(i8* ptr) { > > f(nonnull ptr); > > } > > => > > g(i8* nonnull ptr) { > > f(nonnull ptr); > > } > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > null) should have raised UB, so ptr shouldn't be null. > > However, this transformation is incorrect if f(nonnull null) is > > equivalent to f(poison). > > If f was an empty function, f(nonnull null) never raises UB > > regardless of ptr. So we can't guarantee ptr != null at other uses of ptr. > > For the given code snipped it is not even invalid to propagate nonnull > in the poison case but your point is not wrong, if we have f(poison) > we cannot derive anything from it if the argument might be unused. > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > f(%ptr_inb) > > => > > %ptr_inb = .. (identical) > > f(nonnull %ptr_inb) > > > > This optimization is incorrect if `f(nonnull null)` is UB. The > > reason is as follows. > > If `gep inbounds %any_ptr, 1` yields poison, the source is > > `f(poison)` whereas the optimized one is `f(nonnull poison)`. > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. > > So, the transformation introduced UB. > > This optimization is correct if both `f(nonnull null)` and > > `f(nonnull poison)` are equivalent to `f(poison)`. > > > > > > 3. https://reviews.llvm.org/D69477 > > > > f(nonnull ptr); > > use(ptr); > > => > > llvm.assume(ptr != null); > > use(ptr); > > f(nonnull ptr); > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > > the added llvm.assume(ptr != null) raises UB whereas the source may > > not raise UB at all. (e.g. assume that f() was an empty function) If > > f(nonnull null) is UB, this is correct. > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > f(nonnull ptr); // f’s argument is dead => f(nonnull undef); > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > nonnull should be dropped. This becomes harder to fix if nonnull was > > attached at the signature of a function (not at the callee site). > > It is correct if f(nonnull null) is f(poison). > > > > Actually the D70749's thread had an end-to-end miscompilation > > example due to the interaction between this DAE and the optimization > > 3 (insertion of llvm.assume). > > > > Thanks, > > Juneyoung Lee >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov
Eli Friedman via llvm-dev
2020-Feb-18 21:37 UTC
[llvm-dev] The semantics of nonnull attribute
I think calling the attribute "used" is confusing. I'd suggest the following: "not_poison": If an argument is marked not_poison, and the argument is poison at runtime, the call is instant UB. Whether an argument is poison is checked after the rules for other attributes like "nonnull" and "align" are applied. This makes it clear that the IR semantics don't depend on the implementation of the called function. And it's the logical extension of the way we define the relationship between UB and poison for instructions. -Eli> -----Original Message----- > From: Doerfert, Johannes <jdoerfert at anl.gov> > Sent: Tuesday, February 18, 2020 12:14 PM > To: Nuno Lopes <nuno.lopes at ist.utl.pt> > Cc: 'Juneyoung Lee' <juneyoung.lee at sf.snu.ac.kr>; 'llvm-dev' <llvm- > dev at lists.llvm.org>; 'Roman Lebedev' <lebedev.ri at gmail.com>; llvm- > dev at redking.me.uk; Eli Friedman <efriedma at quicinc.com>; 'Philip Reames' > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>; 'Chung-Kil Hur' > <gil.hur at sf.snu.ac.kr>; 'John Regehr' <regehr at cs.utah.edu> > Subject: [EXT] Re: [llvm-dev] The semantics of nonnull attribute > > I think we are actually in agreement. "Used" should be removable, it would > potentially define UB but that is valid and happening anyway everywhere. > Maybe my initial mail might have been confusing but I tried to show that with > this snippet: > > > void @foo(nonnull); > > void @bar(nonnull used); > > > > > foo(NULL) // <- argument is poison inside of @foo > > > bar(NULL) // <- UB at the call site of @bar > > If we remove `used`, thus go from @bar to @foo, we get a poison argument > instead of UB, which is fine. > > WDYT? > > > > Even if the function is just a load of the pointer argument. > > The Attributor does infer `nonnull` and `dereferenceable` in this case already so > `used` would follow the same logic (+ look for uses in branches etc) > > > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to keep the > UB behavior can just add `used` as well. > > ________________________________________ > From: Nuno Lopes <nuno.lopes at ist.utl.pt> > Sent: Tuesday, February 18, 2020 12:29 > To: Doerfert, Johannes > Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; llvm-dev at redking.me.uk; 'Eli > Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John Regehr' > Subject: RE: [llvm-dev] The semantics of nonnull attribute > > Hi Johannes, > > >> Not sure the semantics of "used" you propose is sufficient. AFAIU the > >> proposal, "used" could only be used in cases where the function will > >> always trigger UB if poison is passed as argument. The semantics of > >> attributes is usually the other way around, since function calls need > >> to have UB as strong as the worst behavior of the function. If a > >> function may for some reason trigger UB with a given set of arguments, > >> then the function call has to trigger UB as well. > > > >"Used", as other attributes are "best effort". We can drop them any time, so > having "insufficient information" is not a correctness problem but an > optimization problem. I mean, even if the function always causes UB if a `null` > argument is given we do not need to know/annotate/exploit that. In fact, we > can, and do, throw away that information, e.g., by removing the instruction > known to cause UB which effectively defines some semantic for the UB program > (which is then "w/o UB"). Long story short, I don't think "as strong as the worst > behavior" is plausible (nor possible TBH). > > I agree with the motivation, but not with the exact semantics. The reason is that > it seems that removing "used" wouldn't be possible. This is because a function > with "used" gives poison, and without gives UB. It should the other way around, > such that optimizers can freely remove "used". > To make it clear, what I understood was: > - %v = call @fn(nonnull null) -- %v == poison > - %v = call @fn(nonnull used null) -- UB > > Even if the function is just a load of the pointer argument. > > Nuno > > > > > Imagine something like: > > Call f(null, 1) > > > > int f(nonnull %ptr, int c) { > > if (c) > > deref %ptr > > } > > > > You can't use the "used" attribute, since UB is only conditional. > > Agreed, for the callee (f) you cannot annotate "used". You can place "used" on > the call site though*. And you can place it on the callee (f) if it was internal or > you copy it to make it internal. So the optimization potential is still there, even if > it is a little trickier. > > * FWIW, one-level of selective context sensitive propagaton is planned > for the Attributor. > > > > we would say the return value of f(null, 1) is poison. But if you > > inline the function you get UB. Inlining has to continue to be correct > > without exceptions. > > The attributes will just never represent the possible UB exhaustively. > I mean, inlining is still correct and the UB that it will expose, together with SCCP, > was there all along. With one-level of context sensitivity we could have even > known, without we just didn't know but that is fine too. > > > ... > [ moved to the front ] > > > I hope this makes some sense. > > Cheers, > Johannes > > P.S. > I think I'll write "used" deduction later so we can take a look at how > this would look like. > > > > Thanks, > > Nuno > > > > > > -----Original Message----- > > From: Johannes Doerfert <jdoerfert at anl.gov> > > Sent: 18 de fevereiro de 2020 02:35 > > Subject: Re: [llvm-dev] The semantics of nonnull attribute > > > > Hi Juneyoung, > > > > thank you for starting this discussion again, the last one [0] did not > > really end in a solution but the issue remains and will only become > > worse, e.g., enable the Attributor and you might see all of the > > problems below exposed in a single pass [1-4]. > > > > As I argued in [0], UB is problematic and I am strongly in favor of > > poison. In some sense I see UB as a very "strong" guarantee and poison > > as a fairly "week" one. We will have a hard time deriving the strong > > one and it might cause us to break things along the way, I mean your > > examples show that we already miscompile codes. > > > > I agree that poison is on its own not "strong enough" for certain > > optimization, as you pointed out below, but we could reasonably > > augment it with another attribute, let's call it `used`, that > > basically bridges the gap: > > > > void @foo(nonnull); > > void @bar(nonnull used); > > > > foo(NULL) // <- argument is poison inside of @foo > > bar(NULL) // <- UB at the call site of @bar > > > > A value is `used` if it would inevitably cause UB if it was poison at > > this program point. I hope this makes some sense if not I can explain > > in more detail. > > > > Btw. I think this also integrates well with other attribute problems > > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in > > this example were "not used" (in some naive implementation of memcpy > > at least). > > > > Maybe I have overlooked a problem with this solution so let me know > > what you think. > > > > Cheers, > > Johannes > > > > > > [0] https://reviews.llvm.org/D70749 > > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: > > https://godbolt.org/z/TAkSC6 [3] 3. below: > > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. > > below: https://godbolt.org/z/Qkkc_E > > https://godbolt.org/z/uXAfp_ > > ... > > > > On 02/18, Juneyoung Lee via llvm-dev wrote: > > > Hello all, > > > > > > LangRef says it is undefined behavior to pass null to a nonnull > > > argument (`call f(nonnull null);`), but the semantics is too strong > > > for a few existing optimizations. > > > To support these, we can relax the semantics so `f(nonnull null)` is > > > equivalent to `f(poison)`, but (A) it again blocks another set of > > > optimizations, and (B) this makes the semantics of nonnull deviate > > > from other attributes like dereferenceable. > > > I found that there was a similar discussion about this issue in the > > > past as well, but seems it is not settled yet. > > > What should the semantics of nonnull be? > > > I listed a few optimizations that are relevant with this issue. > > > > > > > > > 1. Propagating nonnull attribute to callee's arg ( > > > https://godbolt.org/z/-cVsVP ) > > > > > > g(i8* ptr) { > > > f(nonnull ptr); > > > } > > > => > > > g(i8* nonnull ptr) { > > > f(nonnull ptr); > > > } > > > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > > null) should have raised UB, so ptr shouldn't be null. > > > However, this transformation is incorrect if f(nonnull null) is > > > equivalent to f(poison). > > > If f was an empty function, f(nonnull null) never raises UB > > > regardless of ptr. So we can't guarantee ptr != null at other uses of ptr. > > > > For the given code snipped it is not even invalid to propagate nonnull > > in the poison case but your point is not wrong, if we have f(poison) > > we cannot derive anything from it if the argument might be unused. > > > > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > > f(%ptr_inb) > > > => > > > %ptr_inb = .. (identical) > > > f(nonnull %ptr_inb) > > > > > > This optimization is incorrect if `f(nonnull null)` is UB. The > > > reason is as follows. > > > If `gep inbounds %any_ptr, 1` yields poison, the source is > > > `f(poison)` whereas the optimized one is `f(nonnull poison)`. > > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. > > > So, the transformation introduced UB. > > > This optimization is correct if both `f(nonnull null)` and > > > `f(nonnull poison)` are equivalent to `f(poison)`. > > > > > > > > > 3. https://reviews.llvm.org/D69477 > > > > > > f(nonnull ptr); > > > use(ptr); > > > => > > > llvm.assume(ptr != null); > > > use(ptr); > > > f(nonnull ptr); > > > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > > > the added llvm.assume(ptr != null) raises UB whereas the source may > > > not raise UB at all. (e.g. assume that f() was an empty function) If > > > f(nonnull null) is UB, this is correct. > > > > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > > > f(nonnull ptr); // f's argument is dead => f(nonnull undef); > > > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > > nonnull should be dropped. This becomes harder to fix if nonnull was > > > attached at the signature of a function (not at the callee site). > > > It is correct if f(nonnull null) is f(poison). > > > > > > Actually the D70749's thread had an end-to-end miscompilation > > > example due to the interaction between this DAE and the optimization > > > 3 (insertion of llvm.assume). > > > > > > Thanks, > > > Juneyoung Lee > > > > -- > > Johannes Doerfert > Researcher > > Argonne National Laboratory > Lemont, IL 60439, USA > > jdoerfert at anl.gov
Nuno Lopes via llvm-dev
2020-Feb-18 22:38 UTC
[llvm-dev] The semantics of nonnull attribute
> I think we are actually in agreement. "Used" should be removable, it would > potentially define UB but that is valid and happening anyway everywhere. > Maybe my initial mail might have been confusing but I tried to show that > with this snippet: > >> void @foo(nonnull); >> void @bar(nonnull used); >> >> > foo(NULL) // <- argument is poison inside of @foo >> > bar(NULL) // <- UB at the call site of @bar > > If we remove `used`, thus go from @bar to @foo, we get a poison argument > instead of UB, which is fine.Ah, now I understand the idea. Sorry, I was thinking about the return value of the function rather that the value of the argument. The guarantee that a callee gets is that a nonnull argument is either non-null or poison. Should be sufficient to fold comparisons with null. The disadvantage is that the caller cannot infer that a pointer passed as nonnull argument is indeed non-null. Unless you use this non_poison attribute. I don't think I have enough data/experience to have at any idea which of these 3 semantics is better (outright UB vs UB only if non_poison vs arg is poison in callee). If frontends add non_poison everywhere, then it's that same have any outright UB semantics. Deducing it might not be very easy either. Nuno> > > Even if the function is just a load of the pointer argument. > > The Attributor does infer `nonnull` and `dereferenceable` in this case > already so `used` would follow the same logic (+ look for uses in branches > etc) > > > FWIW, frontends that want `nonnull`, `align`, `dereferenceable`, etc. to > keep the UB behavior can just add `used` as well.________________________________________ From: Nuno Lopes <nuno.lopes at ist.utl.pt> Sent: Tuesday, February 18, 2020 12:29 To: Doerfert, Johannes Cc: 'Juneyoung Lee'; 'llvm-dev'; 'Roman Lebedev'; llvm-dev at redking.me.uk; 'Eli Friedman'; 'Philip Reames'; Finkel, Hal J.; 'Chung-Kil Hur'; 'John Regehr' Subject: RE: [llvm-dev] The semantics of nonnull attribute Hi Johannes,>> Not sure the semantics of "used" you propose is sufficient. AFAIU the >> proposal, "used" could only be used in cases where the function will >> always trigger UB if poison is passed as argument. The semantics of >> attributes is usually the other way around, since function calls need >> to have UB as strong as the worst behavior of the function. If a >> function may for some reason trigger UB with a given set of arguments, >> then the function call has to trigger UB as well. > >"Used", as other attributes are "best effort". We can drop them any time, >so having "insufficient information" is not a correctness problem but an >optimization problem. I mean, even if the function always causes UB if a >`null` argument is given we do not need to know/annotate/exploit that. In >fact, we can, and do, throw away that information, e.g., by removing the >instruction known to cause UB which effectively defines some semantic for >the UB program (which is then "w/o UB"). Long story short, I don't think >"as strong as the worst behavior" is plausible (nor possible TBH).I agree with the motivation, but not with the exact semantics. The reason is that it seems that removing "used" wouldn't be possible. This is because a function with "used" gives poison, and without gives UB. It should the other way around, such that optimizers can freely remove "used". To make it clear, what I understood was: - %v = call @fn(nonnull null) -- %v == poison - %v = call @fn(nonnull used null) -- UB Even if the function is just a load of the pointer argument. Nuno> Imagine something like: > Call f(null, 1) > > int f(nonnull %ptr, int c) { > if (c) > deref %ptr > } > > You can't use the "used" attribute, since UB is only conditional.Agreed, for the callee (f) you cannot annotate "used". You can place "used" on the call site though*. And you can place it on the callee (f) if it was internal or you copy it to make it internal. So the optimization potential is still there, even if it is a little trickier. * FWIW, one-level of selective context sensitive propagaton is planned for the Attributor.> we would say the return value of f(null, 1) is poison. But if you > inline the function you get UB. Inlining has to continue to be correct > without exceptions.The attributes will just never represent the possible UB exhaustively. I mean, inlining is still correct and the UB that it will expose, together with SCCP, was there all along. With one-level of context sensitivity we could have even known, without we just didn't know but that is fine too.> ...[ moved to the front ] I hope this makes some sense. Cheers, Johannes P.S. I think I'll write "used" deduction later so we can take a look at how this would look like.> Thanks, > Nuno > > > -----Original Message----- > From: Johannes Doerfert <jdoerfert at anl.gov> > Sent: 18 de fevereiro de 2020 02:35 > Subject: Re: [llvm-dev] The semantics of nonnull attribute > > Hi Juneyoung, > > thank you for starting this discussion again, the last one [0] did not > really end in a solution but the issue remains and will only become > worse, e.g., enable the Attributor and you might see all of the > problems below exposed in a single pass [1-4]. > > As I argued in [0], UB is problematic and I am strongly in favor of > poison. In some sense I see UB as a very "strong" guarantee and poison > as a fairly "week" one. We will have a hard time deriving the strong > one and it might cause us to break things along the way, I mean your > examples show that we already miscompile codes. > > I agree that poison is on its own not "strong enough" for certain > optimization, as you pointed out below, but we could reasonably > augment it with another attribute, let's call it `used`, that > basically bridges the gap: > > void @foo(nonnull); > void @bar(nonnull used); > > foo(NULL) // <- argument is poison inside of @foo > bar(NULL) // <- UB at the call site of @bar > > A value is `used` if it would inevitably cause UB if it was poison at > this program point. I hope this makes some sense if not I can explain > in more detail. > > Btw. I think this also integrates well with other attribute problems > we had in the past, e.g., memcpy(NULL, NULL, 0), since the pointers in > this example were "not used" (in some naive implementation of memcpy > at least). > > Maybe I have overlooked a problem with this solution so let me know > what you think. > > Cheers, > Johannes > > > [0] https://reviews.llvm.org/D70749 > [1] 1. below: https://godbolt.org/z/J4wFVw [2] 2. below: > https://godbolt.org/z/TAkSC6 [3] 3. below: > https://godbolt.org/z/UqYi6S (doesn't work yet, will though) [4] 4. > below: https://godbolt.org/z/Qkkc_E > https://godbolt.org/z/uXAfp_ > ... > > On 02/18, Juneyoung Lee via llvm-dev wrote: > > Hello all, > > > > LangRef says it is undefined behavior to pass null to a nonnull > > argument (`call f(nonnull null);`), but the semantics is too strong > > for a few existing optimizations. > > To support these, we can relax the semantics so `f(nonnull null)` is > > equivalent to `f(poison)`, but (A) it again blocks another set of > > optimizations, and (B) this makes the semantics of nonnull deviate > > from other attributes like dereferenceable. > > I found that there was a similar discussion about this issue in the > > past as well, but seems it is not settled yet. > > What should the semantics of nonnull be? > > I listed a few optimizations that are relevant with this issue. > > > > > > 1. Propagating nonnull attribute to callee's arg ( > > https://godbolt.org/z/-cVsVP ) > > > > g(i8* ptr) { > > f(nonnull ptr); > > } > > => > > g(i8* nonnull ptr) { > > f(nonnull ptr); > > } > > > > This is correct if f(nonnull null) is UB. If ptr == null, f(nonnull > > null) should have raised UB, so ptr shouldn't be null. > > However, this transformation is incorrect if f(nonnull null) is > > equivalent to f(poison). > > If f was an empty function, f(nonnull null) never raises UB > > regardless of ptr. So we can't guarantee ptr != null at other uses of > > ptr. > > For the given code snipped it is not even invalid to propagate nonnull > in the poison case but your point is not wrong, if we have f(poison) > we cannot derive anything from it if the argument might be unused. > > > > 2. InstCombine (https://godbolt.org/z/HDQ7rD ): > > > > %ptr_inb = gep inbounds %any_ptr, 1 > > f(%ptr_inb) > > => > > %ptr_inb = .. (identical) > > f(nonnull %ptr_inb) > > > > This optimization is incorrect if `f(nonnull null)` is UB. The > > reason is as follows. > > If `gep inbounds %any_ptr, 1` yields poison, the source is > > `f(poison)` whereas the optimized one is `f(nonnull poison)`. > > `f(nonnull poison)` should be UB because `f(nonnull null)` is UB. > > So, the transformation introduced UB. > > This optimization is correct if both `f(nonnull null)` and > > `f(nonnull poison)` are equivalent to `f(poison)`. > > > > > > 3. https://reviews.llvm.org/D69477 > > > > f(nonnull ptr); > > use(ptr); > > => > > llvm.assume(ptr != null); > > use(ptr); > > f(nonnull ptr); > > > > If f(nonnull null) is f(poison), this is incorrect. If ptr was null, > > the added llvm.assume(ptr != null) raises UB whereas the source may > > not raise UB at all. (e.g. assume that f() was an empty function) If > > f(nonnull null) is UB, this is correct. > > > > > > 4. Dead argument elimination (from https://reviews.llvm.org/D70749 ) > > > > f(nonnull ptr); // f’s argument is dead => f(nonnull undef); > > > > This is incorrect if f(nonnull null) is UB. To make this correct, > > nonnull should be dropped. This becomes harder to fix if nonnull was > > attached at the signature of a function (not at the callee site). > > It is correct if f(nonnull null) is f(poison). > > > > Actually the D70749's thread had an end-to-end miscompilation > > example due to the interaction between this DAE and the optimization > > 3 (insertion of llvm.assume). > > > > Thanks, > > Juneyoung Lee >-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov