On Thu, Jan 29, 2015 at 10:01 PM, Matthias Braun <matze at braunis.de> wrote:> But > (Poison > INT_MAX) <=> poison > contradicts > (X > INT_MAX) <=> false > > and I don't think you want to abandon the second rule just because x might be poison.Maybe we could define poison in such a way that it is safe to pretend it "is" false, as per our convenience. In this sense, could be defined to be very similar to undef. -- Sanjoy> > - Matthias > >> On Jan 29, 2015, at 9:43 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >> >> One way around this is to say that there are some special >> instructions, icmp, sext and zext which produce a value solely >> composed of poison bits if any of their input bits is poison. So >> `<poison> icmp X` is poison for any value of X, including INT_MAX. >> This is one way poison could be fundamentally different from undef. >> >> -- Sanjoy >> >>> On Thu, Jan 29, 2015 at 8:05 PM, Matthias Braun <matze at braunis.de> wrote: >>> Having though about this some more I think optimizing >>> >>> (x+1 > x) <=> true >>> >>> and at the same time modeling undefined behavior as a posion value is impossible. This is because we also want the following rule: >>> >>> (x > INT_MAX) <=> false >>> >>> Now if poison is a value, then the second replacement tells us (poison > INT_MAX) == false which contradicts the first rule. >>> >>> >>> >>> The only way out of this while still allowing (x+1>x)<=>true I can see at the moment is defining that add nsw does produce actual undefined behavior allowing us to “freely rewrite" the following > to true in the UB cases. Of course having add produce actual undefined behaviour greatly limits us in the way we can actually move the instruction around without changing program semantics. >>> >>> The only way allowing moving instructions and having add produce real UB I can see is that as soon as we start moving instructions around (specifically moving the add to a place it does not dominate or moving any other instruction over the add) we change the add to an instruction that does not produce real UB anymore; something like add swu = “add signed wrap gives undef”... >>> >>> - Matthias >>> >>>>> On Jan 29, 2015, at 11:29 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >>>>> >>>>> I've been discussing a model with David that might steer poison back towards >>>>> something that simply supports algebraic simplification. If we have a math >>>>> operation that cannot wrap, then it notionally produces as many bits of >>>>> undef as the operation could possibly produce. For example, "add nsw i8" can >>>>> produce an i9 undef, and "mul nsw i8" can produce an undefined 16 bit >>>>> bitpattern. This is strong enough to do things like "a + 1 > a --> true", >>>>> because on overflow of "a + 1" we can choose an poison value of "MAX_INT + >>>>> 1", even though that is not a valid i8 bit pattern. >>>>> >>>>> So, a short version would be that poison is like undef, except you get to >>>>> include the overflow bits of the computation in your undef value. >>>> >>>> I suspect it will be hard to justify reg2mem is meaning preserving in >>>> this scheme. But if this can be made to work, then I agree that this >>>> is the right thing to do -- an i32 poison effectively has the >>>> semantics of a wider type, and it makes sense to be explicit in that. >>>> >>>> -- Sanjoy >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Here's an idea for a slightly unusual framework for poison semantics: we do it in two steps -- 1. for every bit in the program, we define a second "shadow bit", is-poison. We define the semantics of LLVM IR using this is-poison relation. So, for instance, we could say if there is a bit 'b'in address 'a' such that if is-poison['b'], then "store X to 'a'" is undefined behavior. 2. we prove that there is no need to track is-poison for a well-defined program. IOW, any program whose evaluation will actually require us to do the book-keeping for is-poison will eventually run into undefined behavior. I think such an approach will explicitly address the issue that we're using an N bit value to track 2^N+1 possibilities. And this is also a sanity check -- since the is-poison relation is clearly made up and does not *really exist*, if we cannot prove (2) then we did something wrong in specifying step (1). -- Sanjoy On Thu, Jan 29, 2015 at 10:05 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> On Thu, Jan 29, 2015 at 10:01 PM, Matthias Braun <matze at braunis.de> wrote: >> But >> (Poison > INT_MAX) <=> poison >> contradicts >> (X > INT_MAX) <=> false >> >> and I don't think you want to abandon the second rule just because x might be poison. > > Maybe we could define poison in such a way that it is safe to pretend > it "is" false, as per our convenience. In this sense, could be > defined to be very similar to undef. > > -- Sanjoy > >> >> - Matthias >> >>> On Jan 29, 2015, at 9:43 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >>> >>> One way around this is to say that there are some special >>> instructions, icmp, sext and zext which produce a value solely >>> composed of poison bits if any of their input bits is poison. So >>> `<poison> icmp X` is poison for any value of X, including INT_MAX. >>> This is one way poison could be fundamentally different from undef. >>> >>> -- Sanjoy >>> >>>> On Thu, Jan 29, 2015 at 8:05 PM, Matthias Braun <matze at braunis.de> wrote: >>>> Having though about this some more I think optimizing >>>> >>>> (x+1 > x) <=> true >>>> >>>> and at the same time modeling undefined behavior as a posion value is impossible. This is because we also want the following rule: >>>> >>>> (x > INT_MAX) <=> false >>>> >>>> Now if poison is a value, then the second replacement tells us (poison > INT_MAX) == false which contradicts the first rule. >>>> >>>> >>>> >>>> The only way out of this while still allowing (x+1>x)<=>true I can see at the moment is defining that add nsw does produce actual undefined behavior allowing us to “freely rewrite" the following > to true in the UB cases. Of course having add produce actual undefined behaviour greatly limits us in the way we can actually move the instruction around without changing program semantics. >>>> >>>> The only way allowing moving instructions and having add produce real UB I can see is that as soon as we start moving instructions around (specifically moving the add to a place it does not dominate or moving any other instruction over the add) we change the add to an instruction that does not produce real UB anymore; something like add swu = “add signed wrap gives undef”... >>>> >>>> - Matthias >>>> >>>>>> On Jan 29, 2015, at 11:29 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >>>>>> >>>>>> I've been discussing a model with David that might steer poison back towards >>>>>> something that simply supports algebraic simplification. If we have a math >>>>>> operation that cannot wrap, then it notionally produces as many bits of >>>>>> undef as the operation could possibly produce. For example, "add nsw i8" can >>>>>> produce an i9 undef, and "mul nsw i8" can produce an undefined 16 bit >>>>>> bitpattern. This is strong enough to do things like "a + 1 > a --> true", >>>>>> because on overflow of "a + 1" we can choose an poison value of "MAX_INT + >>>>>> 1", even though that is not a valid i8 bit pattern. >>>>>> >>>>>> So, a short version would be that poison is like undef, except you get to >>>>>> include the overflow bits of the computation in your undef value. >>>>> >>>>> I suspect it will be hard to justify reg2mem is meaning preserving in >>>>> this scheme. But if this can be made to work, then I agree that this >>>>> is the right thing to do -- an i32 poison effectively has the >>>>> semantics of a wider type, and it makes sense to be explicit in that. >>>>> >>>>> -- Sanjoy >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
This is actually how i tend to think of poison -- it is undef, plus a shadow bit indicating the bits are poison. I think the important this is defining how the various operations propagate poison, why, and why not. On Thu, Jan 29, 2015 at 10:14 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> Here's an idea for a slightly unusual framework for poison semantics: > we do it in two steps -- > > > 1. for every bit in the program, we define a second "shadow bit", > is-poison. We define the semantics of LLVM IR using this is-poison > relation. So, for instance, we could say if there is a bit 'b'in > address 'a' such that if is-poison['b'], then "store X to 'a'" is > undefined behavior. > > 2. we prove that there is no need to track is-poison for a > well-defined program. IOW, any program whose evaluation will actually > require us to do the book-keeping for is-poison will eventually run > into undefined behavior. > > I think such an approach will explicitly address the issue that we're > using an N bit value to track 2^N+1 possibilities. And this is also a > sanity check -- since the is-poison relation is clearly made up and > does not *really exist*, if we cannot prove (2) then we did something > wrong in specifying step (1). > > -- Sanjoy > > > On Thu, Jan 29, 2015 at 10:05 PM, Sanjoy Das > <sanjoy at playingwithpointers.com> wrote: > > On Thu, Jan 29, 2015 at 10:01 PM, Matthias Braun <matze at braunis.de> > wrote: > >> But > >> (Poison > INT_MAX) <=> poison > >> contradicts > >> (X > INT_MAX) <=> false > >> > >> and I don't think you want to abandon the second rule just because x > might be poison. > > > > Maybe we could define poison in such a way that it is safe to pretend > > it "is" false, as per our convenience. In this sense, could be > > defined to be very similar to undef. > > > > -- Sanjoy > > > >> > >> - Matthias > >> > >>> On Jan 29, 2015, at 9:43 PM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >>> > >>> One way around this is to say that there are some special > >>> instructions, icmp, sext and zext which produce a value solely > >>> composed of poison bits if any of their input bits is poison. So > >>> `<poison> icmp X` is poison for any value of X, including INT_MAX. > >>> This is one way poison could be fundamentally different from undef. > >>> > >>> -- Sanjoy > >>> > >>>> On Thu, Jan 29, 2015 at 8:05 PM, Matthias Braun <matze at braunis.de> > wrote: > >>>> Having though about this some more I think optimizing > >>>> > >>>> (x+1 > x) <=> true > >>>> > >>>> and at the same time modeling undefined behavior as a posion value is > impossible. This is because we also want the following rule: > >>>> > >>>> (x > INT_MAX) <=> false > >>>> > >>>> Now if poison is a value, then the second replacement tells us > (poison > INT_MAX) == false which contradicts the first rule. > >>>> > >>>> > >>>> > >>>> The only way out of this while still allowing (x+1>x)<=>true I can > see at the moment is defining that add nsw does produce actual undefined > behavior allowing us to “freely rewrite" the following > to true in the UB > cases. Of course having add produce actual undefined behaviour greatly > limits us in the way we can actually move the instruction around without > changing program semantics. > >>>> > >>>> The only way allowing moving instructions and having add produce real > UB I can see is that as soon as we start moving instructions around > (specifically moving the add to a place it does not dominate or moving any > other instruction over the add) we change the add to an instruction that > does not produce real UB anymore; something like add swu = “add signed wrap > gives undef”... > >>>> > >>>> - Matthias > >>>> > >>>>>> On Jan 29, 2015, at 11:29 AM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >>>>>> > >>>>>> I've been discussing a model with David that might steer poison > back towards > >>>>>> something that simply supports algebraic simplification. If we have > a math > >>>>>> operation that cannot wrap, then it notionally produces as many > bits of > >>>>>> undef as the operation could possibly produce. For example, "add > nsw i8" can > >>>>>> produce an i9 undef, and "mul nsw i8" can produce an undefined 16 > bit > >>>>>> bitpattern. This is strong enough to do things like "a + 1 > a > --> true", > >>>>>> because on overflow of "a + 1" we can choose an poison value of > "MAX_INT + > >>>>>> 1", even though that is not a valid i8 bit pattern. > >>>>>> > >>>>>> So, a short version would be that poison is like undef, except you > get to > >>>>>> include the overflow bits of the computation in your undef value. > >>>>> > >>>>> I suspect it will be hard to justify reg2mem is meaning preserving > in > >>>>> this scheme. But if this can be made to work, then I agree that this > >>>>> is the right thing to do -- an i32 poison effectively has the > >>>>> semantics of a wider type, and it makes sense to be explicit in that. > >>>>> > >>>>> -- Sanjoy > >>>>> _______________________________________________ > >>>>> LLVM Developers mailing list > >>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/3363f529/attachment.html>