Peter Lawrence via llvm-dev
2017-Jun-13 17:27 UTC
[llvm-dev] A tagged architecture, the elephant in the undef / poison room
Here’s what seems to really be going on “undef” === models an uninitialized register, but “poison” === turns the entire IR into a tagged architecture Is this really the way to go ? It seems like a odd choice given that none of our current targets are tagged architectures, all of this tagged IR has to somehow be reduced back down to normal target machine instructions. This question is meant to be thought provoking, I’m not looking for quick rhetorical answers. Correct me if I am wrong, but it seems no one has outlined a proof that C programs can be translated into IR+”poison", optimized according to the definition of “poison”, and always be guaranteed to still be the same C program. Right now all we have are beliefs based on intuition that this can be done, but no actual proof. In other words, everyone believed that “undef” was the way to go until counter examples started showing up, and now everyone believes that “poison” is the way to go, which will be true until counter examples start showing up. We are jumping from one shaky precipice to another, we still are not yet on solid ground. Peter Lawrence.
Krzysztof Parzyszek via llvm-dev
2017-Jun-13 17:51 UTC
[llvm-dev] A tagged architecture, the elephant in the undef / poison room
On 6/13/2017 12:27 PM, Peter Lawrence via llvm-dev wrote:> Here’s what seems to really be going on > > “undef” === models an uninitialized register, but > > “poison” === turns the entire IR into a tagged architectureThe problem with undef was that while "cmp eq i32 %x, %x" was always true, if %x turned out to be undef, then we end up with "cmp eq i32 undef, undef", which may be folded either way. Poison is meant to avoid that issue by saying that once you establish a certain property of it, that property of it remains. Maybe there is an interpretation of it that matches a tagged architecture, but I don't see how that's significant. Do you have any particular concern regarding it? (Other than about the ability to represent a C program.) -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
John Regehr via llvm-dev
2017-Jun-15 23:15 UTC
[llvm-dev] A tagged architecture, the elephant in the undef / poison room
> It seems like a odd choice given that none of our current targets > are tagged architectures, all of this tagged IR has to somehow be > reduced back down to normal target machine instructions.I agree with Krzysztof that an analogy with tagged architectures is strained. One way to see this is to look at the interpreter mode of lli, which is a proper LLVM interpreter that doesn't need to tag values as poison or not. It completely ignores poison afaik.> Correct me if I am wrong, but it seems no one has outlined a proof > that C programs can be translated into IR+”poison", optimized according > to the definition of “poison”, and always be guaranteed to still > be the same C program. Right now all we have are beliefs > based on intuition that this can be done, but no actual proof.It isn't clear that producing such a proof is a good value proposition, since Clang isn't a major source of miscompilation bugs, as far as I know. The problems under discussion (e.g, in the freeze paper) are in the middle end, not the front end.> In other words, everyone believed that “undef” was the way to go until > counter examples started showing up, and now everyone believes > that “poison” is the way to go, which will be true until counter examples > start showing up. We are jumping from one shaky precipice to another, > we still are not yet on solid ground.I disagree with this characterization. Allow me to propose a different one. - In the beginning, there was no undef and no poison. This was a workable situation, a perfectly suitable basis for a correct compiler, but certain desirable optimizations could not be performed. - Undef was added to allow the IR to express don't-care values, with the result that writing some optimizations got a bit harder but also more optimizations could be performed. - Poison was added to allow the IR to express additional optimizations that cannot be justified by undef, with the result that writing some optimizations got a bit harder still, but again more optimizations could be performed. This is the current situation and there are three problems. (1) People weren't always sure which of poison or undef to use, this confusion persists and needs to get fixed. The common case (I think) is a comment or documentation saying undef where it should be poison. (2) Undef and poison interact in tricky ways. (3) Conflicting assumptions were sometimes made about how UB works, leading to problems such as ones described in the freeze paper. But let's be clear: UB problems are in corner cases and they don't affect very many developers or users. The sense of the community (as far as I can tell) is that a fundamental overhaul isn't called for. Various proposals have been made, including ours, about how to fix the UB model, but none has yet been adopted. Peter, you are obviously entitled to your views, but my take is that it probably isn't productive to start multiple UB-related threads in a short time period, nor is it productive to sort of throw stones at all aspects of the model. We can't debate everything at once and we're not going to throw out a huge base of optimizations built around the current model. I'll also repeat Daniel's request that you tone down the open-ended requests for proofs/counterexamples/whatever that would generate a lot of work for anyone trying to discuss these matters with you. Instead of producing abstract arguments, please provide concrete examples, ideally accompanied by code. John
Sanjoy Das via llvm-dev
2017-Jun-17 03:23 UTC
[llvm-dev] A tagged architecture, the elephant in the undef / poison room
Hi Peter, On Tue, Jun 13, 2017 at 10:27 AM, Peter Lawrence via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Here’s what seems to really be going on > > “undef” === models an uninitialized register, butNo, it specifically does not. You yourself have mentioned the reason many times -- every "read" of undef returns a new value, which is different from, say, %rax with garbage in it.> “poison” === turns the entire IR into a tagged architecture > > > Is this really the way to go ? > > It seems like a odd choice given that none of our current targets > are tagged architectures, all of this tagged IR has to somehow be > reduced back down to normal target machine instructions.No architecture (that I know of) supports PHI nodes either, yet they are an enormously useful concept. Comparing LLVM IR to machine ISAs is a mistake -- the mid level IR should be evaluated in terms of how well it can do what it is supposed to do, which is enable mid level optimizations. There's a reason why we don't use MCInsts all the way through. Thanks! -- Sanjoy
Peter Lawrence via llvm-dev
2017-Jun-19 15:36 UTC
[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room
> On Jun 16, 2017, at 8:23 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > Hi Peter, > > On Tue, Jun 13, 2017 at 10:27 AM, Peter Lawrence via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Here’s what seems to really be going on >> >> “undef” === models an uninitialized register, but > > No, it specifically does not. You yourself have mentioned the reason > many times -- every "read" of undef returns a new value, which is > different from, say, %rax with garbage in it. >Sanjoy, You’ve hit the nail on the head !, but I don’t think we’ve correctly identified the root cause of the problem yet. Rather the problem is with how we incorrectly cse and copy-propagate it: x = undef if (x == x) —————> this is an illegal copy-propagation —————> if (undef == undef) If you don’t believe this is illegal then we end up with the absurdity of the function-inlining example [1], and the argument against the function-inlining example is so compelling that John decided to drop out of the argument, IE he gave up because it is indefensible. Apparently this copy-propagation has been justified by thinking of "undef" as an IR "constant” and that any optimization you can do to a “constant” can also be done to “undef” without further thought. Instead each ‘undef’ should be thought of as a live on entry register, IE an incoming argument physical register, and “x = undef” cannot be optimized any more than “x = PhysReg0”, in particular multiple uses of X does not mean multiple incoming argument registers, and separate instances of “undef” cannot be equated any more than distinct incoming argument registers. To the argument that this may create unnecessary register pressure I say that is a register allocator issue not an IR issue, the register allocator can and should figure this out and do the right thing. Peter Lawrence. [1. this function *always* executes statement S, F(a) { If (a == a) S; } but in llvm if you inline it and “a” happens to be “undef” then nothing can be said about whether statement S is executed. This is indefensible.]>> “poison” === turns the entire IR into a tagged architecture >> >> >> Is this really the way to go ? >> >> It seems like a odd choice given that none of our current targets >> are tagged architectures, all of this tagged IR has to somehow be >> reduced back down to normal target machine instructions. > > No architecture (that I know of) supports PHI nodes either, yet they > are an enormously useful concept. Comparing LLVM IR to machine ISAs > is a mistake -- the mid level IR should be evaluated in terms of how > well it can do what it is supposed to do, which is enable mid level > optimizations. There's a reason why we don't use MCInsts all the way > through. > > Thanks! > -- Sanjoy
Maybe Matching Threads
- the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room
- A tagged architecture, the elephant in the undef / poison room
- a tagged architecture, the elephant in the undef / poison room
- a tagged architecture, the elephant in the undef / poison room
- RFC: Killing undef and spreading poison