Nuno Lopes via llvm-dev
2016-Dec-07 23:27 UTC
[llvm-dev] Killing undef and spreading poison
>> 1) a bit in the output is poison if flipping any set of poison bits >> in the >> input may yield different values for the output bit. > > This is the definition I'd expect. > >> For example (bitwise notation): >> ppp * 000 == 000 (since flipping any of the poison bits cannot yield >> a >> result other than zero) >> 00p + 00p == 0pp >> ppp << 2 == p00 >> icmp ugt ppp, uint_max == p (special case) >> >> 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is >> poison if >> bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). >> This one unfortunately breaks some algebraic rewrites like: x * -1 >> -> 0 >> - x >> >> >> I've implemented both semantics in Alive (branches: newsema2 and >> bitwise-poison2, respectively) and then we run Alive over a large set >> of >> optimizations to see which were ok and which became wrong. >> >> Many optimizations became wrong with either of the bitwise semantics. > > To be clear, "wrong" here means that the value resulting from the > transformation might have more poison bits for some inputs than the > original value, correct?The vast majority of the examples were of that case, yes. But we also have a few cases where the transformed expression computes a different value (I don't have a record of these handy, sorry).>> So, my suggestion is to go for value-wise poison, but maybe with a >> few >> tweaks from what I've presented (based on the feedback we received): >> - Have 2 load instructions: one that does a value-wise access (if >> any bit >> is poison in memory, the whole value becomes poison), and another >> version >> that freezes each bit individually (which is just syntactic sugar for >> a >> freeze(load <n x i1>) followed by a bitcast). The latter load is very >> close >> to the load instruction we have today. >> >> Adding this extra instruction has some benefits: it simplifies IR >> auto-upgrade, and frontends can move to the new instruction >> incrementally. >> Also, it seems that C++ codegen could use the freezing load in a few >> places. > > Can we be specific on what "few places" we're talking about? I assume this > means at least: > > 1. Accesses to bit fields (or other places where Clang generates loads > larger than the underlying data type) > 2. memcpy-like operations > 3. Accesses though character pointers? (because character pointers can be > used to access data of any type)1. Bit fields definitely. Sometimes clang needs to widen loads because of the ABI, but these can probably be changed to vector loads instead. 2. Clang introduces memcpys to copy objects (which may have uninitialized padding). My suggestion (contrary to my previous opinion) is to make memcpy a bit-wise copy to simplify things. 3. For char ptrs, not sure that's needed. You would be loading an i8, which is the type you are asking for, so it doesn't seem necessary. 4. I'm not a clang expert by any means, but I briefly spoke with Richard Smith during the dev meeting and he told me that there are some cases involving initialization of unions in C++ that may also need this freezing load. I don't know the exact details.>> On the negative side is that the freezing load is harder to move >> around and > > Why exactly are these harder to move around?Because they have an implicit freeze, which is also non-trivial to move. For example, these freezing loads cannot easily be sank into loops, like: %v = load %p while (...) { use(%v) } => while (...) { %v = load %p; use(%v) } The load inside the loop may return a different value for each uninitialized/poison bit in each loop iteration, while in the original code all iterations would see the same value for %v. So this load should be reserved for the cases when it's actually needed only. For example, in clang, all loads from the program itself should use the value-wise-poison-load, not this one. This load is similar as if we were to make undef yield the same value for all uses. Since our load today returns undef for uninitialized memory, the current load would become pretty much equivalent to the load we are proposing. (we already established that having undef is not a good idea; this is just an analogy that will hopefully give some intuition and help understanding the concept of this new load) Nuno
Hal Finkel via llvm-dev
2016-Dec-07 23:39 UTC
[llvm-dev] Killing undef and spreading poison
----- Original Message -----> From: "Nuno Lopes" <nuno.lopes at ist.utl.pt> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Sanjoy Das" <sanjoy at playingwithpointers.com>, "David Majnemer" <david.majnemer at gmail.com>, "John Regehr" > <regehr at cs.utah.edu>, "Chung-Kil Hur" <gil.hur at sf.snu.ac.kr>, "Juneyoung Lee" <juneyoung.lee at sf.snu.ac.kr>, "Dan > Gohman" <dan433584 at gmail.com>, "Philip Reames" <listmail at philipreames.com>, clattner at apple.com, "Chandler Carruth" > <chandlerc at google.com>, llvm-dev at lists.llvm.org > Sent: Wednesday, December 7, 2016 5:27:31 PM > Subject: Re: Killing undef and spreading poison > > >> 1) a bit in the output is poison if flipping any set of poison > >> bits > >> in the > >> input may yield different values for the output bit. > > > > This is the definition I'd expect. > > > >> For example (bitwise notation): > >> ppp * 000 == 000 (since flipping any of the poison bits cannot > >> yield > >> a > >> result other than zero) > >> 00p + 00p == 0pp > >> ppp << 2 == p00 > >> icmp ugt ppp, uint_max == p (special case) > >> > >> 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is > >> poison if > >> bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). > >> This one unfortunately breaks some algebraic rewrites like: x * -1 > >> -> 0 > >> - x > >> > >> > >> I've implemented both semantics in Alive (branches: newsema2 and > >> bitwise-poison2, respectively) and then we run Alive over a large > >> set > >> of > >> optimizations to see which were ok and which became wrong. > >> > >> Many optimizations became wrong with either of the bitwise > >> semantics. > > > > To be clear, "wrong" here means that the value resulting from the > > transformation might have more poison bits for some inputs than the > > original value, correct? > > The vast majority of the examples were of that case, yes. But we > also have > a few cases where the transformed expression computes a different > value (I > don't have a record of these handy, sorry).Those are just bugs, right?> > > >> So, my suggestion is to go for value-wise poison, but maybe with a > >> few > >> tweaks from what I've presented (based on the feedback we > >> received): > >> - Have 2 load instructions: one that does a value-wise access (if > >> any bit > >> is poison in memory, the whole value becomes poison), and another > >> version > >> that freezes each bit individually (which is just syntactic sugar > >> for > >> a > >> freeze(load <n x i1>) followed by a bitcast). The latter load is > >> very > >> close > >> to the load instruction we have today. > >> > >> Adding this extra instruction has some benefits: it simplifies IR > >> auto-upgrade, and frontends can move to the new instruction > >> incrementally. > >> Also, it seems that C++ codegen could use the freezing load in a > >> few > >> places. > > > > Can we be specific on what "few places" we're talking about? I > > assume this > > means at least: > > > > 1. Accesses to bit fields (or other places where Clang generates > > loads > > larger than the underlying data type) > > 2. memcpy-like operations > > 3. Accesses though character pointers? (because character pointers > > can be > > used to access data of any type) > > 1. Bit fields definitely. Sometimes clang needs to widen loads > because of > the ABI, but these can probably be changed to vector loads instead. > 2. Clang introduces memcpys to copy objects (which may have > uninitialized > padding). My suggestion (contrary to my previous opinion) is to make > memcpy > a bit-wise copy to simplify things.And then how exactly do we turn small memcpys into "regular" loads/stores? Thanks again, Hal> 3. For char ptrs, not sure that's needed. You would be loading an i8, > which > is the type you are asking for, so it doesn't seem necessary. > 4. I'm not a clang expert by any means, but I briefly spoke with > Richard > Smith during the dev meeting and he told me that there are some cases > involving initialization of unions in C++ that may also need this > freezing > load. I don't know the exact details. > > > >> On the negative side is that the freezing load is harder to move > >> around and > > > > Why exactly are these harder to move around? > > Because they have an implicit freeze, which is also non-trivial to > move. > For example, these freezing loads cannot easily be sank into loops, > like: > %v = load %p > while (...) { use(%v) } > => > while (...) { %v = load %p; use(%v) } > > The load inside the loop may return a different value for each > uninitialized/poison bit in each loop iteration, while in the > original code > all iterations would see the same value for %v. > > So this load should be reserved for the cases when it's actually > needed > only. For example, in clang, all loads from the program itself > should use > the value-wise-poison-load, not this one. > This load is similar as if we were to make undef yield the same value > for > all uses. Since our load today returns undef for uninitialized > memory, the > current load would become pretty much equivalent to the load we are > proposing. > (we already established that having undef is not a good idea; this is > just > an analogy that will hopefully give some intuition and help > understanding > the concept of this new load) > > Nuno > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Nuno Lopes via llvm-dev
2016-Dec-08 22:16 UTC
[llvm-dev] Killing undef and spreading poison
>> >> I've implemented both semantics in Alive (branches: newsema2 and >> >> bitwise-poison2, respectively) and then we run Alive over a large >> >> set >> >> of >> >> optimizations to see which were ok and which became wrong. >> >> >> >> Many optimizations became wrong with either of the bitwise >> >> semantics. >> > >> > To be clear, "wrong" here means that the value resulting from the >> > transformation might have more poison bits for some inputs than the >> > original value, correct? >> >> The vast majority of the examples were of that case, yes. But we >> also have >> a few cases where the transformed expression computes a different >> value (I >> don't have a record of these handy, sorry). > > Those are just bugs, right?Well, increasing the number of poison bits would also be considered a bug. I guess the main issue is that some InstCombine transformations that we consider correct today are not under the bitwise poison semantics. Some of these transformations start producing wrong values because poison is not propagating enough for us to be able to discard the inputs that trigger different output values as "uninteresting". Bitwise poison semantics has too many opportunities to lose poison (e.g., shifting out poison bits, no carry in arithmetic instructions, and/or with zero/one bits, etc).>> >> Adding this extra instruction has some benefits: it simplifies IR >> >> auto-upgrade, and frontends can move to the new instruction >> >> incrementally. >> >> Also, it seems that C++ codegen could use the freezing load in a >> >> few >> >> places. >> > >> > Can we be specific on what "few places" we're talking about? I >> > assume this >> > means at least: >> > >> > 1. Accesses to bit fields (or other places where Clang generates >> > loads >> > larger than the underlying data type) >> > 2. memcpy-like operations >> > 3. Accesses though character pointers? (because character pointers >> > can be >> > used to access data of any type) >> >> 1. Bit fields definitely. Sometimes clang needs to widen loads >> because of >> the ABI, but these can probably be changed to vector loads instead. >> 2. Clang introduces memcpys to copy objects (which may have >> uninitialized >> padding). My suggestion (contrary to my previous opinion) is to make >> memcpy >> a bit-wise copy to simplify things. > > And then how exactly do we turn small memcpys into "regular" loads/stores?Bit vector load/stores (i.e., <n x i1>). I was really happy to see that the codegen of i1 vectors has gotten some attention recently btw! Nuno
John Regehr via llvm-dev
2016-Dec-27 18:22 UTC
[llvm-dev] Killing undef and spreading poison
Related work: http://plv.mpi-sws.org/llvmcs/paper.pdf Hope everyone had a great Christmas! John On 12/7/16 4:27 PM, Nuno Lopes wrote:>>> 1) a bit in the output is poison if flipping any set of poison bits >>> in the >>> input may yield different values for the output bit. >> >> This is the definition I'd expect. >> >>> For example (bitwise notation): >>> ppp * 000 == 000 (since flipping any of the poison bits cannot yield >>> a >>> result other than zero) >>> 00p + 00p == 0pp >>> ppp << 2 == p00 >>> icmp ugt ppp, uint_max == p (special case) >>> >>> 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is >>> poison if >>> bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). >>> This one unfortunately breaks some algebraic rewrites like: x * -1 >>> -> 0 >>> - x >>> >>> >>> I've implemented both semantics in Alive (branches: newsema2 and >>> bitwise-poison2, respectively) and then we run Alive over a large set >>> of >>> optimizations to see which were ok and which became wrong. >>> >>> Many optimizations became wrong with either of the bitwise semantics. >> >> To be clear, "wrong" here means that the value resulting from the >> transformation might have more poison bits for some inputs than the >> original value, correct? > > The vast majority of the examples were of that case, yes. But we also > have a few cases where the transformed expression computes a different > value (I don't have a record of these handy, sorry). > > >>> So, my suggestion is to go for value-wise poison, but maybe with a >>> few >>> tweaks from what I've presented (based on the feedback we received): >>> - Have 2 load instructions: one that does a value-wise access (if >>> any bit >>> is poison in memory, the whole value becomes poison), and another >>> version >>> that freezes each bit individually (which is just syntactic sugar for >>> a >>> freeze(load <n x i1>) followed by a bitcast). The latter load is very >>> close >>> to the load instruction we have today. >>> >>> Adding this extra instruction has some benefits: it simplifies IR >>> auto-upgrade, and frontends can move to the new instruction >>> incrementally. >>> Also, it seems that C++ codegen could use the freezing load in a few >>> places. >> >> Can we be specific on what "few places" we're talking about? I assume >> this means at least: >> >> 1. Accesses to bit fields (or other places where Clang generates >> loads larger than the underlying data type) >> 2. memcpy-like operations >> 3. Accesses though character pointers? (because character pointers >> can be used to access data of any type) > > 1. Bit fields definitely. Sometimes clang needs to widen loads because > of the ABI, but these can probably be changed to vector loads instead. > 2. Clang introduces memcpys to copy objects (which may have > uninitialized padding). My suggestion (contrary to my previous opinion) > is to make memcpy a bit-wise copy to simplify things. > 3. For char ptrs, not sure that's needed. You would be loading an i8, > which is the type you are asking for, so it doesn't seem necessary. > 4. I'm not a clang expert by any means, but I briefly spoke with Richard > Smith during the dev meeting and he told me that there are some cases > involving initialization of unions in C++ that may also need this > freezing load. I don't know the exact details. > > >>> On the negative side is that the freezing load is harder to move >>> around and >> >> Why exactly are these harder to move around? > > Because they have an implicit freeze, which is also non-trivial to move. > For example, these freezing loads cannot easily be sank into loops, like: > %v = load %p > while (...) { use(%v) } > => > while (...) { %v = load %p; use(%v) } > > The load inside the loop may return a different value for each > uninitialized/poison bit in each loop iteration, while in the original > code all iterations would see the same value for %v. > > So this load should be reserved for the cases when it's actually needed > only. For example, in clang, all loads from the program itself should > use the value-wise-poison-load, not this one. > This load is similar as if we were to make undef yield the same value > for all uses. Since our load today returns undef for uninitialized > memory, the current load would become pretty much equivalent to the load > we are proposing. > (we already established that having undef is not a good idea; this is > just an analogy that will hopefully give some intuition and help > understanding the concept of this new load) > > Nuno