Nuno Lopes via llvm-dev
2016-Dec-06 18:38 UTC
[llvm-dev] RFC: Killing undef and spreading poison
Hi, Thanks everybody that showed up in our talk at the LLVM dev meeting and to those that provided feedback so far. The slides are already online: http://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.pdf The main question that some people raised was whether we could have bitwise poison instead of value-wise poison, since that semantics seems to be more natural as values continue to be just a bag of bits. During the talk I didn't have a good answer to this question. We've now studied this option more thoroughly. Apologies for the delay, but I think now we have a good handle on the subject. Ok, so bitwise poison is not a well-defined concept in itself; we still to define the semantics of the individual operations themselves. We studied two different options: 1) a bit in the output is poison if flipping any set of poison bits in the input may yield different values for the output bit. For example (bitwise notation): ppp * 000 == 000 (since flipping any of the poison bits cannot yield a result other than zero) 00p + 00p == 0pp ppp << 2 == p00 icmp ugt ppp, uint_max == p (special case) 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is poison if bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). This one unfortunately breaks some algebraic rewrites like: x * -1 -> 0 - x I've implemented both semantics in Alive (branches: newsema2 and bitwise-poison2, respectively) and then we run Alive over a large set of optimizations to see which were ok and which became wrong. Many optimizations became wrong with either of the bitwise semantics. But the important ones are: - We keep the problem of not being able to increase number of register uses in an expression. For example, these are wrong: x << 1 -> x + x and x * (1 + y) -> x + x*y (FMA-like transformation). - It becomes hard to make SROA correct and keep all the shift optimizations we have; one of these would be wrong. For example, InstCombine does the following optimization: (%x ashr exact C1) shl C2 -> %x shl (C2-C1) If we assume that shift outputs zero for the bits it introduces when shifting, then these kind of shift optimizations are all wrong (we would need those bits to be poison). On the other hand, SROA combines multiple values with bitwise arithmetic and therefore it requires shift to leave shifted bits as zero. This is a contradiction, so we can't have both (unless we use a ton of freeze instructions). I think these two problems with bitwise poison semantics are a deal breaker. This kind of semantics opens more problems than it solves. BTW, besides shift transformations, 'div exact' suffer from similar problems. E.g., this is wrong: (%X udiv exact %Y) * %Y -> %X. So, my suggestion is to go for value-wise poison, but maybe with a few tweaks from what I've presented (based on the feedback we received): - Have 2 load instructions: one that does a value-wise access (if any bit is poison in memory, the whole value becomes poison), and another version that freezes each bit individually (which is just syntactic sugar for a freeze(load <n x i1>) followed by a bitcast). The latter load is very close to the load instruction we have today. Adding this extra instruction has some benefits: it simplifies IR auto-upgrade, and frontends can move to the new instruction incrementally. Also, it seems that C++ codegen could use the freezing load in a few places. On the negative side is that the freezing load is harder to move around and therefore we would probably need a pass to convert these freezing-loads into valuewise-poison-loads whenever possible. It's true that this proposal is also increasing the abstraction level of the LLVM IR a little bit by explicitly saying that values are *not* a bag of bits. Therefore, low-level bit manipulation has to be done carefully. IMHO this slight change is ok, and lower-level optimization can be left to SDAG/MI, where it's safe to do them and it's probably where they belong. LLVM IR is typed, and values need to be treated as having a type in its own right. This is probably more of a change in mindset rather than in the LLVM code itself. At least I couldn't find anything concrete in LLVM that would break. I hope this convinces people that bitwise poison is not a good way to go. I propose we go forward with the value-wise poison with the modification I've described above. Please let us know your thoughts and/or if you have questions. It would be great to finally close all the issues related with undef and poison that we have today :) Thanks, Nuno -----Original Message----- From: Nuno Lopes Sent: 18 de outubro de 2016 13:07 To: 'llvm-dev at lists.llvm.org' Subject: RFC: Killing undef and spreading poison Hi, Over the past few years we've been trying to kill poison somehow. There have been a few proposals, but they've all failed to pass the bar and/or to gather significant support (my own proposals included). We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and myself) have a new proposal to kill undef instead and replace it with poison + a new 'freeze' instruction. We believe this proposal simplifies things at the IR level, allows us to fix long-standing bugs in LLVM, and is roughly performance-neutral for now (and can enable further optimizations in the future). Sorry for the longish email. We will give a talk about this proposal at the LLVM dev meeting in November, so hopefully we'll be able to discuss this further in person. We would appreciate any comments, feedback, suggestions, etc. If you have questions let me know as well (not everything below is super detailed, so please ask where things are not explicit enough). (I've CC'ed a few people that have been involved in discussions re semantics of LLVM IR in the past year or so. Apologies in advance if I forgot someone -- which probably I did. I've also CC'ed some CodeGen folks, from which we would appreciate input on how this proposal fits within the current and the future pipeline) Thanks, Nuno --------------------------------------------------------- Motivation for undef & poison ============================There were a few motivations behind the creation of undef and poison: 1) There was a desire to exploit certain undefined behaviors without hurting optimization opportunities. This led to the creation of weaker forms of undefined behavior (UB), while leaving "full" UB to operations that can trap the CPU, like division by zero. Weaker UB operations include, for example, signed integer overflow, say 'a + b'. If signed overflow was full UB, then we couldn't speculatively execute these operations without proving that they could not overflow, otherwise the compiler would be introducing UB in a perfectly fine program. E.g., the compiler couldn't hoist signed additions out of loops. In weaker forms of UB, the triggering of full UB is delayed so that the compiler can speculatively execute these operations and there's only full UB if the result is somehow "consumed". 2) Undef was created to represent uninitialized memory reads. For example, it's very useful in conjunction with PHI nodes: int a; if (foo) a_0 = 42; a_1 = phi(a_0, undef) By using undef, we don't have to materialize some arbitrary constant on a branch where a variable is not initialized. 3) And then people realized that undef wasn't enough. For example, we'd also like to be able to optimize expressions like "((a + 1) >s a)" to "true", which isn't possible with "undef" as defined in (1), since there's no value for `%x` that makes `%x > INT_MAX` true. This was important for loop analyses and certain InstCombine patterns, and so we needed a stronger version of UB which was still not full UB. And poison was born. For example, on 64-bit platforms, poison is very handy for widening induction variables to 64 bits. Problems with undef & poison ===========================The interactions between undef and poison are particularly complicated and they inhibit innocent-looking optimizations. For example, the following is wrong: %v = select %c, %x, undef => %v = %x If %x is poison and %c = false, then we've just replaced undef with poison which is bad, since poison is stronger than undef. So, this is an example of something we want to do but we can't at the moment (even if we still do it anyway, risking miscompilations!) Goal ===Our goal was to propose minimal changes to the IR semantics such that 1) we could keep most of the optimizations we do today, 2) would require minimal code changes, 3) compilation time, run time, and memory consumption would stay roughly the same, and 4) we could fix all the longstanding bugs due to undef/poison semantics and interactions. We believe this proposal fulfills the goals, and can even enable future optimizations. Proposal ======= 1) Kill undef 2) Create a new poison value (representable in IR, inheriting from llvm::Constant) 3) Create a new instruction, '%y = freeze %x', that stops propagation of poison. If the input is poison, then it returns an arbitrary, but fixed, value. (like old undef, but each use gets the *same* value), otherwise it just returns its input value. 4) All operations over poison return poison. For example: and %x, poison -> poison ; just like before and 0, poison -> poison ; just like before %y = freeze poison %z = and %y, 1 --- 000..0x ; just like old undef %w = xor %y, %y ; 0 -- not undef: both uses of %y have the same value Instruction-specific semantics: - br poison -> UB - select poison, %a, %b -> poison (some InstCombine patterns will need freeze, but they are wrong right now already!) - if %c is not poison, select %c, %a, %b is poison if %c = 1 and %a is poison, or %c = 0 and %b is poison (see discussion below) - bitcast between types of different bitwidth can return poison if when concatenating adjacent values one of these values is poison. For example: %x = bitcast <3 x i2> <2, poison, 2> to <2 x i3> -> %x = <poison, poison> %x = bitcast <6 x i2> <2, poison, 2, 2, 2, 2> to <4 x i3> -> %x = <poison, poison, 5, 2> Purpose of Freeze ================Poison is propagated aggressively throughout. However, there are cases where this breaks certain optimizations, and therefore freeze is used to selectively stop poison from being propagated. A use of freeze is to enable speculative execution. For example, loop switching does the following transformation: while (C) { if (C2) { A } else { B } } => if (C2) { while (C) A } else { while (C) B } Here we are evaluating C2 before C. If the original loop never executed then we had never evaluated C2, while now we do. So we need to make sure there's no UB for branching on C2. Freeze ensures that so we would actually have 'if (freeze(C2))' instead. Note that having branch on poison not trigger UB has its own problems. We believe this is a good tradeoff. Another use is, for example, to implement bit-fields: a.x = 2 becomes: %v = load %a %v2 = freeze %v ; %v could be uninitialized data (poison) %v3 = ... bitmasking... store %a, %v3 Performance ==========We measured compile time, running time, memory consumption, and object/IR size of a few benchmarks (-O3 vs -O3 w/ freeze). They should give a rough picture of the overhead involved. Benchmarks were run on 2 machines (server1 and server2), both x86_64 running Ubuntu 16.04. Summary: - Memory consumption: there's generally a 1% increase, with the max of 8% in oggenc - Running time on SPEC: mostly in the noise (about 1% up or down) - Compile time: mostly in the noise (about 1% up or down, but usually no difference) - LNT shows 0.5% average slowdown, but with wider swings both ways. We are aware of a few things that would need tweaking to recover perf like loop unrolling (likely because of SCEV not knowing what freeze is). You can see the raw data we have collected in the links below: Memory consumption: https://docs.google.com/spreadsheets/d/1ycJaMPLzh_4YV7RQmVaLR-vHzd7ZlYiV2iES G-lBRbk/edit#gid=0 SPEC running time: https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-Ex533fy9DezIV rbfeIns/edit#gid=0 Compilation time: https://docs.google.com/spreadsheets/d/1_xj6o_ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW WAR9O2c/edit#gid=0 Statistics on number of Freeze Instructions (up to 5% in total in gcc): https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6mLUmkRIF 1SmZ-6Q/edit#gid=0 LNT raw data: https://github.com/aqjune/freezescript/tree/master/resultcsv-mailinglist/LNT Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM Implementation =============A prototype is available at: - https://github.com/snu-sf/llvm-freeze/tree/x86jmp - https://github.com/snu-sf/clang-freeze This implementation includes: - Make clang emit freeze for bit-field codegen - Add freeze instruction to IR, plus a few instcombine patterns to simply freeze(freeze(x)), for example - Fix a few instcombine select patterns to introduce freeze when needed (these were wrong with current semantics already) - Add a freeze node to selection DAG and add codegen support. For codegen, it uses CopyFromReg+CopyToReg to lower the freeze node, and so it assumes that from that point on LLVM will not propagate undef anymore (to ensure that 'freeze undef' always returns the same value for all uses, which is not what happens with 'undef' on its own). This needs further discussion. - Fix loop unswitch to freeze condition - Change GVN's load widening to load as vector of bits+bitcast (gvload branch only; this was not included in the tests above) (this implementation still uses undef value, but that should be replaced with poison) Deployment =========A proposal for incremental deployment of the proposed changed: 1) Add freeze instruction + CodeGen support 2) Change clang to start emitting freeze for bit-field reads 3) Add auto-upgrade to convert undef to 'freeze poison' (undef and 'freeze poison' with one use are equivalent) 4) Fix InstCombine, Loop unswitching, etc to use freeze 5) Replace references to undef in the code with either poison or 'freeze poison' 7) Kill undef 8) Investigate any perf regressions 9) Run John's LLVM IR fuzzer with Alive to find leftover bugs Regarding perf, at least PR30615 needs fixing to enable efficient load widening at IR level (which seems it's still undecided whether GVN will continue doing it or not). Discussion on select ===================Select is a tricky instruction to define semantics for. There are multiple views of select's purpose that are contradictory: 1) Select should be equivalent to arithmetic (e.g., allow 'select %c, true, %x' -> 'or %c, %x' and arithmetic -> select) 2) br + phi -> select should be allowed (simplifyCFG) 3) Select -> br + phi should be allowed To have 1), we need to make select poison if either of its arguments is poison. This disallows 2), since there we need to have select being poison only if the dynamically-chosen value is poison. 2) and 3) are orthogonal and do not conflict, though. 3) is hard because it requires select to be stronger than branching (UB-wise), meaning that we would need select to be UB if the condition was poison. This blocks a bunch of instcombine patterns, like: Pre: C < 0 %r = udiv %a, C => %c = icmp ult %a, C %r = select %c, 0, 1 If %a is poison, then we would be introducing UB. Adding freeze(%c) would fix the problem, though. Since we really want to have 2) since that's simplifyCFG, we propose to make 'select %c, %a, %b' poison if any of the following holds: - %c is poison - %c = true and %a is poison - %c = false and %b is poison This disallows 3) and some transformations of 1). Since 3) is only performed in CodeGenPrepare, we can safely ignore it (no aggressive optimizations are run afterwards). For 1), we will have to restrict InstCombine patterns for the cases when we are sure a given variable is non-poisonous (which is only when it came from a 'freeze' instruction or it's the result of a non-poison-producing instruction). This semantics also allows arithmetic -> select, but not select -> arithmetic (without use of freeze). Acknowledgements ===============I would personally like to thank David, John, and Sanjoy for embarking on this trip over a year ago and spending a lot of time on this project; Gil and his students (Juneyoung, Yoonseung, and Youngju) for prototyping and testing several ideas (and fixing them!); and everybody else that has contributed in the form of off-line and on-line discussions. Future work ==========This proposal only handles integers. It's a good step forward, but we are still missing pointers and floats; we are aware of problems (err, possibilities for improvement) there. We will work on these next. They are orthogonal problems, so what we are proposing today won't require further changes (hopefully).
Hal Finkel via llvm-dev
2016-Dec-07 12:25 UTC
[llvm-dev] RFC: Killing undef and spreading poison
----- Original Message -----> From: "Nuno Lopes" <nuno.lopes at ist.utl.pt> > To: llvm-dev at lists.llvm.org > Cc: "Sanjoy Das" <sanjoy at playingwithpointers.com>, "David Majnemer" <david.majnemer at gmail.com>, "John Regehr" > <regehr at cs.utah.edu>, "Chung-Kil Hur" <gil.hur at sf.snu.ac.kr>, "Juneyoung Lee" <juneyoung.lee at sf.snu.ac.kr>, > hfinkel at anl.gov, "Dan Gohman" <dan433584 at gmail.com>, "Philip Reames" <listmail at philipreames.com>, > clattner at apple.com, "Chandler Carruth" <chandlerc at google.com> > Sent: Tuesday, December 6, 2016 12:38:26 PM > Subject: RE: RFC: Killing undef and spreading poison > > Hi, > > Thanks everybody that showed up in our talk at the LLVM dev meeting > and to > those that provided feedback so far. > The slides are already online: > http://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.pdf > > The main question that some people raised was whether we could have > bitwise > poison instead of value-wise poison, since that semantics seems to be > more > natural as values continue to be just a bag of bits. > During the talk I didn't have a good answer to this question. We've > now > studied this option more thoroughly. Apologies for the delay, but I > think > now we have a good handle on the subject. > > Ok, so bitwise poison is not a well-defined concept in itself; we > still to > define the semantics of the individual operations themselves. We > studied two > different options: > > 1) a bit in the output is poison if flipping any set of poison bits > in the > input may yield different values for the output bit.This is the definition I'd expect.> For example (bitwise notation): > ppp * 000 == 000 (since flipping any of the poison bits cannot yield > a > result other than zero) > 00p + 00p == 0pp > ppp << 2 == p00 > icmp ugt ppp, uint_max == p (special case) > > 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is > poison if > bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). > This one unfortunately breaks some algebraic rewrites like: x * -1 > -> 0 > - x > > > I've implemented both semantics in Alive (branches: newsema2 and > bitwise-poison2, respectively) and then we run Alive over a large set > of > optimizations to see which were ok and which became wrong. > > Many optimizations became wrong with either of the bitwise semantics.To be clear, "wrong" here means that the value resulting from the transformation might have more poison bits for some inputs than the original value, correct?> But > the important ones are: > - We keep the problem of not being able to increase number of > register uses > in an expression. For example, these are wrong: x << 1 -> x + x > and x * > (1 + y) -> x + x*y (FMA-like transformation). > - It becomes hard to make SROA correct and keep all the shift > optimizations > we have; one of these would be wrong. For example, InstCombine does > the > following optimization: > (%x ashr exact C1) shl C2 -> %x shl (C2-C1) > If we assume that shift outputs zero for the bits it introduces when > shifting, then these kind of shift optimizations are all wrong (we > would > need those bits to be poison). On the other hand, SROA combines > multiple > values with bitwise arithmetic and therefore it requires shift to > leave > shifted bits as zero. This is a contradiction, so we can't have both > (unless we use a ton of freeze instructions). > > I think these two problems with bitwise poison semantics are a deal > breaker. > This kind of semantics opens more problems than it solves. > BTW, besides shift transformations, 'div exact' suffer from similar > problems. E.g., this is wrong: (%X udiv exact %Y) * %Y -> %X. > > > So, my suggestion is to go for value-wise poison, but maybe with a > few > tweaks from what I've presented (based on the feedback we received): > - Have 2 load instructions: one that does a value-wise access (if > any bit > is poison in memory, the whole value becomes poison), and another > version > that freezes each bit individually (which is just syntactic sugar for > a > freeze(load <n x i1>) followed by a bitcast). The latter load is very > close > to the load instruction we have today. > > Adding this extra instruction has some benefits: it simplifies IR > auto-upgrade, and frontends can move to the new instruction > incrementally. > Also, it seems that C++ codegen could use the freezing load in a few > places.Can we be specific on what "few places" we're talking about? I assume this means at least: 1. Accesses to bit fields (or other places where Clang generates loads larger than the underlying data type) 2. memcpy-like operations 3. Accesses though character pointers? (because character pointers can be used to access data of any type)> On the negative side is that the freezing load is harder to move > around andWhy exactly are these harder to move around? Thanks again, Hal> therefore we would probably need a pass to convert these > freezing-loads into > valuewise-poison-loads whenever possible. > > It's true that this proposal is also increasing the abstraction level > of the > LLVM IR a little bit by explicitly saying that values are *not* a bag > of > bits. Therefore, low-level bit manipulation has to be done carefully. > IMHO this slight change is ok, and lower-level optimization can be > left to > SDAG/MI, where it's safe to do them and it's probably where they > belong. > LLVM IR is typed, and values need to be treated as having a type in > its own > right. This is probably more of a change in mindset rather than in > the LLVM > code itself. At least I couldn't find anything concrete in LLVM that > would > break. > > I hope this convinces people that bitwise poison is not a good way to > go. I > propose we go forward with the value-wise poison with the > modification I've > described above. > Please let us know your thoughts and/or if you have questions. It > would be > great to finally close all the issues related with undef and poison > that we > have today :) > > Thanks, > Nuno > > > -----Original Message----- > From: Nuno Lopes > Sent: 18 de outubro de 2016 13:07 > To: 'llvm-dev at lists.llvm.org' > Subject: RFC: Killing undef and spreading poison > > Hi, > > Over the past few years we've been trying to kill poison somehow. > There have > been a few proposals, but they've all failed to pass the bar and/or > to > gather significant support (my own proposals included). > We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and > myself) > have a new proposal to kill undef instead and replace it with poison > + a new > 'freeze' instruction. We believe this proposal simplifies things at > the IR > level, allows us to fix long-standing bugs in LLVM, and is roughly > performance-neutral for now (and can enable further optimizations in > the > future). > > Sorry for the longish email. We will give a talk about this proposal > at the > LLVM dev meeting in November, so hopefully we'll be able to discuss > this > further in person. > > We would appreciate any comments, feedback, suggestions, etc. If you > have > questions let me know as well (not everything below is super > detailed, so > please ask where things are not explicit enough). > (I've CC'ed a few people that have been involved in discussions re > semantics > of LLVM IR in the past year or so. Apologies in advance if I forgot > someone > -- which probably I did. I've also CC'ed some CodeGen folks, from > which we > would appreciate input on how this proposal fits within the current > and the > future pipeline) > > Thanks, > Nuno > > > --------------------------------------------------------- > > > Motivation for undef & poison > ============================> There were a few motivations behind the creation of undef and poison: > 1) There was a desire to exploit certain undefined behaviors without > hurting optimization opportunities. This led to the creation of > weaker > forms of undefined behavior (UB), while leaving "full" UB to > operations that > can trap the CPU, like division by zero. Weaker UB operations > include, for > example, signed integer overflow, say 'a + b'. If signed overflow > was full > UB, then we couldn't speculatively execute these operations without > proving > that they could not overflow, otherwise the compiler would be > introducing UB > in a perfectly fine program. E.g., the compiler couldn't hoist signed > additions out of loops. In weaker forms of UB, the triggering of > full UB is > delayed so that the compiler can speculatively execute these > operations and > there's only full UB if the result is somehow "consumed". > > 2) Undef was created to represent uninitialized memory reads. For > example, > it's very useful in conjunction with PHI nodes: > int a; > if (foo) > a_0 = 42; > a_1 = phi(a_0, undef) > > By using undef, we don't have to materialize some arbitrary constant > on a > branch where a variable is not initialized. > > 3) And then people realized that undef wasn't enough. For example, > we'd > also like to be able to optimize expressions like "((a + 1) >s a)" to > "true", which isn't possible with "undef" as defined in (1), since > there's > no value for `%x` that makes `%x > INT_MAX` true. This was important > for > loop analyses and certain InstCombine patterns, and so we needed a > stronger > version of UB which was still not full UB. And poison was born. For > example, > on 64-bit platforms, poison is very handy for widening induction > variables > to 64 bits. > > > > Problems with undef & poison > ===========================> The interactions between undef and poison are particularly > complicated and > they inhibit innocent-looking optimizations. For example, the > following is > wrong: > %v = select %c, %x, undef > => > %v = %x > > If %x is poison and %c = false, then we've just replaced undef with > poison > which is bad, since poison is stronger than undef. So, this is an > example > of something we want to do but we can't at the moment (even if we > still do > it anyway, risking miscompilations!) > > > > Goal > ===> Our goal was to propose minimal changes to the IR semantics such that > 1) we > could keep most of the optimizations we do today, 2) would require > minimal > code changes, 3) compilation time, run time, and memory consumption > would > stay roughly the same, and 4) we could fix all the longstanding bugs > due to > undef/poison semantics and interactions. > We believe this proposal fulfills the goals, and can even enable > future > optimizations. > > > > Proposal > =======> 1) Kill undef > 2) Create a new poison value (representable in IR, inheriting from > llvm::Constant) > 3) Create a new instruction, '%y = freeze %x', that stops > propagation of > poison. If the input is poison, then it returns an arbitrary, but > fixed, > value. (like old undef, but each use gets the *same* value), > otherwise it > just returns its input value. > 4) All operations over poison return poison. > For example: > and %x, poison -> poison ; just like before > and 0, poison -> poison ; just like before > > %y = freeze poison > %z = and %y, 1 --- 000..0x ; just like old undef > %w = xor %y, %y ; 0 -- not undef: both uses of %y > have > the same value > > Instruction-specific semantics: > - br poison -> UB > - select poison, %a, %b -> poison (some InstCombine patterns will > need > freeze, but they are wrong right now already!) > - if %c is not poison, select %c, %a, %b is poison if %c = 1 and %a > is > poison, or %c = 0 and %b is poison (see discussion below) > - bitcast between types of different bitwidth can return poison if > when > concatenating adjacent values one of these values is poison. For > example: > %x = bitcast <3 x i2> <2, poison, 2> to <2 x i3> > -> > %x = <poison, poison> > > %x = bitcast <6 x i2> <2, poison, 2, 2, 2, 2> to <4 x i3> > -> > %x = <poison, poison, 5, 2> > > > > Purpose of Freeze > ================> Poison is propagated aggressively throughout. However, there are > cases where > this breaks certain optimizations, and therefore freeze is used to > selectively stop poison from being propagated. > > A use of freeze is to enable speculative execution. For example, > loop > switching does the following transformation: > while (C) { > if (C2) { > A > } else { > B > } > } > => > if (C2) { > while (C) > A > } else { > while (C) > B > } > > Here we are evaluating C2 before C. If the original loop never > executed > then we had never evaluated C2, while now we do. So we need to make > sure > there's no UB for branching on C2. Freeze ensures that so we would > actually > have 'if (freeze(C2))' instead. > Note that having branch on poison not trigger UB has its own > problems. We > believe this is a good tradeoff. > > > Another use is, for example, to implement bit-fields: > a.x = 2 > becomes: > %v = load %a > %v2 = freeze %v ; %v could be uninitialized data (poison) > %v3 = ... bitmasking... > store %a, %v3 > > > > Performance > ==========> We measured compile time, running time, memory consumption, and > object/IR > size of a few benchmarks (-O3 vs -O3 w/ freeze). They should give a > rough > picture of the overhead involved. > Benchmarks were run on 2 machines (server1 and server2), both x86_64 > running > Ubuntu 16.04. > > Summary: > - Memory consumption: there's generally a 1% increase, with the max > of 8% > in oggenc > - Running time on SPEC: mostly in the noise (about 1% up or down) > - Compile time: mostly in the noise (about 1% up or down, but > usually no > difference) > - LNT shows 0.5% average slowdown, but with wider swings both ways. > We are > aware of a few things that would need tweaking to recover perf like > loop > unrolling (likely because of SCEV not knowing what freeze is). > > You can see the raw data we have collected in the links below: > > Memory consumption: > https://docs.google.com/spreadsheets/d/1ycJaMPLzh_4YV7RQmVaLR-vHzd7ZlYiV2iES > G-lBRbk/edit#gid=0 > SPEC running time: > https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-Ex533fy9DezIV > rbfeIns/edit#gid=0 > Compilation time: > https://docs.google.com/spreadsheets/d/1_xj6o_ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW > WAR9O2c/edit#gid=0 > Statistics on number of Freeze Instructions (up to 5% in total in > gcc): > https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6mLUmkRIF > 1SmZ-6Q/edit#gid=0 > LNT raw data: > https://github.com/aqjune/freezescript/tree/master/resultcsv-mailinglist/LNT > > Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM > Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM > > > > Implementation > =============> A prototype is available at: > - https://github.com/snu-sf/llvm-freeze/tree/x86jmp > - https://github.com/snu-sf/clang-freeze > > This implementation includes: > - Make clang emit freeze for bit-field codegen > - Add freeze instruction to IR, plus a few instcombine patterns to > simply > freeze(freeze(x)), for example > - Fix a few instcombine select patterns to introduce freeze when > needed > (these were wrong with current semantics already) > - Add a freeze node to selection DAG and add codegen support. For > codegen, > it uses CopyFromReg+CopyToReg to lower the freeze node, and so it > assumes > that from that point on LLVM will not propagate undef anymore (to > ensure > that 'freeze undef' always returns the same value for all uses, which > is not > what happens with 'undef' on its own). This needs further discussion. > - Fix loop unswitch to freeze condition > - Change GVN's load widening to load as vector of bits+bitcast > (gvload > branch only; this was not included in the tests above) > > (this implementation still uses undef value, but that should be > replaced > with poison) > > > > Deployment > =========> A proposal for incremental deployment of the proposed changed: > 1) Add freeze instruction + CodeGen support > 2) Change clang to start emitting freeze for bit-field reads > 3) Add auto-upgrade to convert undef to 'freeze poison' (undef and > 'freeze > poison' with one use are equivalent) > 4) Fix InstCombine, Loop unswitching, etc to use freeze > 5) Replace references to undef in the code with either poison or > 'freeze > poison' > 7) Kill undef > 8) Investigate any perf regressions > 9) Run John's LLVM IR fuzzer with Alive to find leftover bugs > > Regarding perf, at least PR30615 needs fixing to enable efficient > load > widening at IR level (which seems it's still undecided whether GVN > will > continue doing it or not). > > > > Discussion on select > ===================> Select is a tricky instruction to define semantics for. There are > multiple > views of select's purpose that are contradictory: > 1) Select should be equivalent to arithmetic (e.g., allow 'select > %c, > true, %x' -> 'or %c, %x' and arithmetic -> select) > 2) br + phi -> select should be allowed (simplifyCFG) > 3) Select -> br + phi should be allowed > > To have 1), we need to make select poison if either of its arguments > is > poison. This disallows 2), since there we need to have select being > poison > only if the dynamically-chosen value is poison. > 2) and 3) are orthogonal and do not conflict, though. > > 3) is hard because it requires select to be stronger than branching > (UB-wise), meaning that we would need select to be UB if the > condition was > poison. This blocks a bunch of instcombine patterns, like: > Pre: C < 0 > %r = udiv %a, C > => > %c = icmp ult %a, C > %r = select %c, 0, 1 > > If %a is poison, then we would be introducing UB. Adding freeze(%c) > would > fix the problem, though. > > > Since we really want to have 2) since that's simplifyCFG, we propose > to make > 'select %c, %a, %b' poison if any of the following holds: > - %c is poison > - %c = true and %a is poison > - %c = false and %b is poison > > This disallows 3) and some transformations of 1). Since 3) is only > performed > in CodeGenPrepare, we can safely ignore it (no aggressive > optimizations are > run afterwards). For 1), we will have to restrict InstCombine > patterns for > the cases when we are sure a given variable is non-poisonous (which > is only > when it came from a 'freeze' instruction or it's the result of a > non-poison-producing instruction). > This semantics also allows arithmetic -> select, but not select -> > arithmetic (without use of freeze). > > > > Acknowledgements > ===============> I would personally like to thank David, John, and Sanjoy for > embarking on > this trip over a year ago and spending a lot of time on this project; > Gil > and his students (Juneyoung, Yoonseung, and Youngju) for prototyping > and > testing several ideas (and fixing them!); and everybody else that has > contributed in the form of off-line and on-line discussions. > > > > Future work > ==========> This proposal only handles integers. It's a good step forward, but > we are > still missing pointers and floats; we are aware of problems (err, > possibilities for improvement) there. We will work on these next. > They are > orthogonal problems, so what we are proposing today won't require > further > changes (hopefully). > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Nuno Lopes via llvm-dev
2016-Dec-07 23:27 UTC
[llvm-dev] Killing undef and spreading poison
>> 1) a bit in the output is poison if flipping any set of poison bits >> in the >> input may yield different values for the output bit. > > This is the definition I'd expect. > >> For example (bitwise notation): >> ppp * 000 == 000 (since flipping any of the poison bits cannot yield >> a >> result other than zero) >> 00p + 00p == 0pp >> ppp << 2 == p00 >> icmp ugt ppp, uint_max == p (special case) >> >> 2) (thanks to Sanjoy): for an operation R = A op B, bit R[i] is >> poison if >> bit i of 'A op undef' may yield 0 and 1 (same for 'undef op B'). >> This one unfortunately breaks some algebraic rewrites like: x * -1 >> -> 0 >> - x >> >> >> I've implemented both semantics in Alive (branches: newsema2 and >> bitwise-poison2, respectively) and then we run Alive over a large set >> of >> optimizations to see which were ok and which became wrong. >> >> Many optimizations became wrong with either of the bitwise semantics. > > To be clear, "wrong" here means that the value resulting from the > transformation might have more poison bits for some inputs than the > original value, correct?The vast majority of the examples were of that case, yes. But we also have a few cases where the transformed expression computes a different value (I don't have a record of these handy, sorry).>> So, my suggestion is to go for value-wise poison, but maybe with a >> few >> tweaks from what I've presented (based on the feedback we received): >> - Have 2 load instructions: one that does a value-wise access (if >> any bit >> is poison in memory, the whole value becomes poison), and another >> version >> that freezes each bit individually (which is just syntactic sugar for >> a >> freeze(load <n x i1>) followed by a bitcast). The latter load is very >> close >> to the load instruction we have today. >> >> Adding this extra instruction has some benefits: it simplifies IR >> auto-upgrade, and frontends can move to the new instruction >> incrementally. >> Also, it seems that C++ codegen could use the freezing load in a few >> places. > > Can we be specific on what "few places" we're talking about? I assume this > means at least: > > 1. Accesses to bit fields (or other places where Clang generates loads > larger than the underlying data type) > 2. memcpy-like operations > 3. Accesses though character pointers? (because character pointers can be > used to access data of any type)1. Bit fields definitely. Sometimes clang needs to widen loads because of the ABI, but these can probably be changed to vector loads instead. 2. Clang introduces memcpys to copy objects (which may have uninitialized padding). My suggestion (contrary to my previous opinion) is to make memcpy a bit-wise copy to simplify things. 3. For char ptrs, not sure that's needed. You would be loading an i8, which is the type you are asking for, so it doesn't seem necessary. 4. I'm not a clang expert by any means, but I briefly spoke with Richard Smith during the dev meeting and he told me that there are some cases involving initialization of unions in C++ that may also need this freezing load. I don't know the exact details.>> On the negative side is that the freezing load is harder to move >> around and > > Why exactly are these harder to move around?Because they have an implicit freeze, which is also non-trivial to move. For example, these freezing loads cannot easily be sank into loops, like: %v = load %p while (...) { use(%v) } => while (...) { %v = load %p; use(%v) } The load inside the loop may return a different value for each uninitialized/poison bit in each loop iteration, while in the original code all iterations would see the same value for %v. So this load should be reserved for the cases when it's actually needed only. For example, in clang, all loads from the program itself should use the value-wise-poison-load, not this one. This load is similar as if we were to make undef yield the same value for all uses. Since our load today returns undef for uninitialized memory, the current load would become pretty much equivalent to the load we are proposing. (we already established that having undef is not a good idea; this is just an analogy that will hopefully give some intuition and help understanding the concept of this new load) Nuno