On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:> > Ah, yes. You are right, we cannot always assume that %y would be zero in > > the second case. > > This wouldn't be the first time we've lost information that we could use > to > > optimize a program by transforming it. > > > > Do you think this result would be problematic? It seems consistent with > the > > RFC and LLVM's current behavior. > > > > The problem is not that we're losing information, the problem is that > we're changing the behavior of a well-defined program. > > I'll try to put the whole argument in one place: > > We start with > > %x = add nuw i32 %m, %n > %y = zext i32 %x to i64 > %s = lshr i64 %y, 32 > %addr = gep %some_global, %s > store i32 42, i32* %addr > > In the above program, for all values of %x, %s is 0. This means the > program is well-defined when %x is poison (since you don't need to > look at %x to determine the value of %addr, in the same sense as you > don't need to look at X to determine the value of "and X, 0"); and it > stores 42 to &(%some_global)[0]. Specifically, the above program is > well defined for "%m = %n = 2^32-1". > > Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw > (zext X) (zext Y)" then we get > > %m.wide = zext i32 %m to i64 > %n.wide = zext i32 %n to i64 > %z = add nuw i64 %m.wide, %n.wide > %s = lshr i64 %y, 32 > %addr = gep %some_global, %s > store i32 42, i32* %addr > > The new program does *not* have the same behavior as the old program > for "%m = %n = 2^32-1". We have changed the behavior of a > well-defined program by doing the "zext (add nuw X Y)" => "add nuw > (zext X) (zext Y)" transform. >After some pondering and combing through LLVM's implementation, I think we must conclude that zexting a value with any poison bits creates poison in every new bit. Considering the following program: %zext = zext i32 %x to i64 %icmp = icmp i64 %zext, i64 1 we'd like to transform this to: %icmp = icmp i32 %x, i32 1 Is it reasonable to say that '%icmp' in the before case is not poison but '%icmp' in the after case is poison? LLVM assumes it can remove casts with impunity, I think this is a useful property to maintain.> > -- Sanjoy >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150201/e77adea9/attachment.html>
On Sun, Feb 1, 2015 at 1:57 AM, David Majnemer <david.majnemer at gmail.com> wrote:> > On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >> > Ah, yes. You are right, we cannot always assume that %y would be zero >> in >> > the second case. >> > This wouldn't be the first time we've lost information that we could >> use to >> > optimize a program by transforming it. >> > >> > Do you think this result would be problematic? It seems consistent >> with the >> > RFC and LLVM's current behavior. >> > >> >> The problem is not that we're losing information, the problem is that >> we're changing the behavior of a well-defined program. >> >> I'll try to put the whole argument in one place: >> >> We start with >> >> %x = add nuw i32 %m, %n >> %y = zext i32 %x to i64 >> %s = lshr i64 %y, 32 >> %addr = gep %some_global, %s >> store i32 42, i32* %addr >> >> In the above program, for all values of %x, %s is 0. This means the >> program is well-defined when %x is poison (since you don't need to >> look at %x to determine the value of %addr, in the same sense as you >> don't need to look at X to determine the value of "and X, 0"); and it >> stores 42 to &(%some_global)[0]. Specifically, the above program is >> well defined for "%m = %n = 2^32-1". >> >> Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw >> (zext X) (zext Y)" then we get >> >> %m.wide = zext i32 %m to i64 >> %n.wide = zext i32 %n to i64 >> %z = add nuw i64 %m.wide, %n.wide >> %s = lshr i64 %y, 32 >> %addr = gep %some_global, %s >> store i32 42, i32* %addr >> >> The new program does *not* have the same behavior as the old program >> for "%m = %n = 2^32-1". We have changed the behavior of a >> well-defined program by doing the "zext (add nuw X Y)" => "add nuw >> (zext X) (zext Y)" transform. >> > > After some pondering and combing through LLVM's implementation, I think we > must conclude that zexting a value with any poison bits creates poison in > every new bit. > > Considering the following program: > > %zext = zext i32 %x to i64 > %icmp = icmp i64 %zext, i64 1 > > we'd like to transform this to: > > %icmp = icmp i32 %x, i32 1 > > Is it reasonable to say that '%icmp' in the before case is not poison but > '%icmp' in the after case is poison? LLVM assumes it can remove casts with > impunity, I think this is a useful property to maintain. >FWIW, I agree with your statement. Here is the line of reasoning that I find troubling. If we accept the above, we have a surprising result (using small bit-width integers to make it easier to read) %zext = zext i1 %x to i2 %and = and i2 %zext, 1 We cannot replace %and with %zext because the %and might be removing poison. Perhaps this restriction is OK though. I just find it somewhat troubling. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150201/dbeb25d7/attachment.html>
I don't know how things work at the moment, but it seems to me that you can do lots of sensible things, and avoid lots of silly things, if you keep track of four possible values for each bit: - undef (the default) - poison - known to be 0 - known to be 1 This makes both David's and Chandler's examples work nicely if you assume: - ZEXT makes all the new bits known 0 - SEXT makes all the new bits the same as the high bit - AND clears unknown and poison bits to known 0 if the other input is known 0 - OR sets unknown and poison bits to known 1 if the other input is known 1 Also things such as ZEXTing a poison i32 to i64 and then right shifting by 32 will result in all known 0 bits. On Sun, Feb 1, 2015 at 11:19 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Sun, Feb 1, 2015 at 1:57 AM, David Majnemer <david.majnemer at gmail.com> > wrote: > >> >> On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das < >> sanjoy at playingwithpointers.com> wrote: >> >>> > Ah, yes. You are right, we cannot always assume that %y would be zero >>> in >>> > the second case. >>> > This wouldn't be the first time we've lost information that we could >>> use to >>> > optimize a program by transforming it. >>> > >>> > Do you think this result would be problematic? It seems consistent >>> with the >>> > RFC and LLVM's current behavior. >>> > >>> >>> The problem is not that we're losing information, the problem is that >>> we're changing the behavior of a well-defined program. >>> >>> I'll try to put the whole argument in one place: >>> >>> We start with >>> >>> %x = add nuw i32 %m, %n >>> %y = zext i32 %x to i64 >>> %s = lshr i64 %y, 32 >>> %addr = gep %some_global, %s >>> store i32 42, i32* %addr >>> >>> In the above program, for all values of %x, %s is 0. This means the >>> program is well-defined when %x is poison (since you don't need to >>> look at %x to determine the value of %addr, in the same sense as you >>> don't need to look at X to determine the value of "and X, 0"); and it >>> stores 42 to &(%some_global)[0]. Specifically, the above program is >>> well defined for "%m = %n = 2^32-1". >>> >>> Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw >>> (zext X) (zext Y)" then we get >>> >>> %m.wide = zext i32 %m to i64 >>> %n.wide = zext i32 %n to i64 >>> %z = add nuw i64 %m.wide, %n.wide >>> %s = lshr i64 %y, 32 >>> %addr = gep %some_global, %s >>> store i32 42, i32* %addr >>> >>> The new program does *not* have the same behavior as the old program >>> for "%m = %n = 2^32-1". We have changed the behavior of a >>> well-defined program by doing the "zext (add nuw X Y)" => "add nuw >>> (zext X) (zext Y)" transform. >>> >> >> After some pondering and combing through LLVM's implementation, I think >> we must conclude that zexting a value with any poison bits creates poison >> in every new bit. >> >> Considering the following program: >> >> %zext = zext i32 %x to i64 >> %icmp = icmp i64 %zext, i64 1 >> >> we'd like to transform this to: >> >> %icmp = icmp i32 %x, i32 1 >> >> Is it reasonable to say that '%icmp' in the before case is not poison but >> '%icmp' in the after case is poison? LLVM assumes it can remove casts with >> impunity, I think this is a useful property to maintain. >> > > FWIW, I agree with your statement. > > Here is the line of reasoning that I find troubling. > > If we accept the above, we have a surprising result (using small bit-width > integers to make it easier to read) > > %zext = zext i1 %x to i2 > %and = and i2 %zext, 1 > > We cannot replace %and with %zext because the %and might be removing > poison. > > Perhaps this restriction is OK though. I just find it somewhat troubling. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150201/323e5a03/attachment.html>