thr3ads.net - llvm dev - [llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Peter Lawrence via llvm-dev

2017-Jun-13 17:27 UTC

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

Here’s what seems to really be going on

	“undef”  ===  models an uninitialized register,  but

	“poison”  ===  turns the entire IR into a tagged architecture


Is this really the way to go ?

It seems like a odd choice given that none of our current targets
are tagged architectures, all of this tagged IR has to somehow be
reduced back down to normal target machine instructions.



This question is meant to be thought provoking,
I’m not looking for quick rhetorical answers.

Correct me if I am wrong, but it seems no one has outlined a proof
that C programs can be translated into IR+”poison", optimized according
to the definition of “poison”, and always be guaranteed to still
be the same C program. Right now all we have are beliefs
based on intuition that this can be done, but no actual proof.

In other words,  everyone believed that “undef” was the way to go until
counter examples started showing up, and now everyone believes
that “poison” is the way to go, which will be true until counter examples
start showing up. We are jumping from one shaky precipice to another,
we still are not yet on solid ground.


Peter Lawrence.

Krzysztof Parzyszek via llvm-dev

2017-Jun-13 17:51 UTC

head link

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

On 6/13/2017 12:27 PM, Peter Lawrence via llvm-dev
wrote:> Here’s what seems to really be going on
> 
> 	“undef”  ===  models an uninitialized register,  but
> 
> 	“poison”  ===  turns the entire IR into a tagged architecture
The problem with undef was that while "cmp eq i32 %x, %x" was always 
true, if %x turned out to be undef, then we end up with "cmp eq i32 
undef, undef", which may be folded either way.

Poison is meant to avoid that issue by saying that once you establish a 
certain property of it, that property of it remains.  Maybe there is an 
interpretation of it that matches a tagged architecture, but I don't see 
how that's significant.  Do you have any particular concern regarding 
it? (Other than about the ability to represent a C program.)

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

John Regehr via llvm-dev

2017-Jun-15 23:15 UTC

head link

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

> It seems like a odd choice given that none of our current targets
> are tagged architectures, all of this tagged IR has to somehow be
> reduced back down to normal target machine instructions.
I agree with Krzysztof that an analogy with tagged architectures is 
strained. One way to see this is to look at the interpreter mode of lli, 
which is a proper LLVM interpreter that doesn't need to tag values as 
poison or not. It completely ignores poison afaik.
> Correct me if I am wrong, but it seems no one has outlined a proof
> that C programs can be translated into IR+”poison", optimized
according
> to the definition of “poison”, and always be guaranteed to still
> be the same C program. Right now all we have are beliefs
> based on intuition that this can be done, but no actual proof.
It isn't clear that producing such a proof is a good value proposition, 
since Clang isn't a major source of miscompilation bugs, as far as I 
know. The problems under discussion (e.g, in the freeze paper) are in 
the middle end, not the front end.
> In other words,  everyone believed that “undef” was the way to go until
> counter examples started showing up, and now everyone believes
> that “poison” is the way to go, which will be true until counter examples
> start showing up. We are jumping from one shaky precipice to another,
> we still are not yet on solid ground.
I disagree with this characterization. Allow me to propose a different one.

- In the beginning, there was no undef and no poison. This was a 
workable situation, a perfectly suitable basis for a correct compiler, 
but certain desirable optimizations could not be performed.

- Undef was added to allow the IR to express don't-care values, with the 
result that writing some optimizations got a bit harder but also more 
optimizations could be performed.

- Poison was added to allow the IR to express additional optimizations 
that cannot be justified by undef, with the result that writing some 
optimizations got a bit harder still, but again more optimizations could 
be performed. This is the current situation and there are three 
problems. (1) People weren't always sure which of poison or undef to 
use, this confusion persists and needs to get fixed. The common case (I 
think) is a comment or documentation saying undef where it should be 
poison. (2) Undef and poison interact in tricky ways. (3) Conflicting 
assumptions were sometimes made about how UB works, leading to problems 
such as ones described in the freeze paper.

But let's be clear: UB problems are in corner cases and they don't 
affect very many developers or users. The sense of the community (as far 
as I can tell) is that a fundamental overhaul isn't called for. Various 
proposals have been made, including ours, about how to fix the UB model, 
but none has yet been adopted.

Peter, you are obviously entitled to your views, but my take is that it 
probably isn't productive to start multiple UB-related threads in a 
short time period, nor is it productive to sort of throw stones at all 
aspects of the model. We can't debate everything at once and we're not 
going to throw out a huge base of optimizations built around the current 
model. I'll also repeat Daniel's request that you tone down the 
open-ended requests for proofs/counterexamples/whatever that would 
generate a lot of work for anyone trying to discuss these matters with 
you. Instead of producing abstract arguments, please provide concrete 
examples, ideally accompanied by code.

John

Sanjoy Das via llvm-dev

2017-Jun-17 03:23 UTC

head link

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

Hi Peter,

On Tue, Jun 13, 2017 at 10:27 AM, Peter Lawrence via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Here’s what seems to really be going on
>
>         “undef”  ===  models an uninitialized register,  but
No, it specifically does not.  You yourself have mentioned the reason
many times -- every "read" of undef returns a new value, which is
different from, say, %rax with garbage in it.
>         “poison”  ===  turns the entire IR into a tagged architecture
>
>
> Is this really the way to go ?
>
> It seems like a odd choice given that none of our current targets
> are tagged architectures, all of this tagged IR has to somehow be
> reduced back down to normal target machine instructions.
No architecture (that I know of) supports PHI nodes either, yet they
are an enormously useful concept.  Comparing LLVM IR to machine ISAs
is a mistake -- the mid level IR should be evaluated in terms of how
well it can do what it is supposed to do, which is enable mid level
optimizations.  There's a reason why we don't use MCInsts all the way
through.

Thanks!
-- Sanjoy

Peter Lawrence via llvm-dev

2017-Jun-19 15:36 UTC

head link

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

> On Jun 16, 2017, at 8:23 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi Peter,
> 
> On Tue, Jun 13, 2017 at 10:27 AM, Peter Lawrence via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Here’s what seems to really be going on
>> 
>>        “undef”  ===  models an uninitialized register,  but
> 
> No, it specifically does not.  You yourself have mentioned the reason
> many times -- every "read" of undef returns a new value, which is
> different from, say, %rax with garbage in it.
> 

Sanjoy,
            You’ve hit the nail on the head !, but I don’t think we’ve correctly
identified the root cause of the problem yet.

Rather the problem is with how we incorrectly cse and copy-propagate it:
	x = undef
	if (x == x)
	—————>    this is an illegal copy-propagation   —————>
	if (undef == undef)

If you don’t believe this is illegal then we end up with the absurdity of the
function-inlining example [1], and the argument against the function-inlining
example is so compelling that John decided to drop out of the argument,
IE he gave up because it is indefensible.

Apparently this copy-propagation has been justified by thinking of
"undef"
as an IR "constant” and that any optimization you can do to a “constant”
can also be done to “undef” without further thought.

Instead each ‘undef’ should be thought of as a live on entry register, IE an
incoming argument physical register, and “x = undef” cannot be optimized
any more than “x = PhysReg0”, in particular multiple uses of X does not mean
multiple incoming argument registers, and separate instances of “undef”
cannot be equated any more than distinct incoming argument registers.

To the argument that this may create unnecessary register pressure I say
that is a register allocator issue not an IR issue, the register allocator can
and should figure this out and do the right thing.

Peter Lawrence.

[1. this function *always* executes statement S,
	F(a) {
	   If (a == a) S;
	}
   but in llvm if you inline it and “a” happens to be “undef” then nothing can
   be said about whether statement S is executed.  This is indefensible.]

>>        “poison”  ===  turns the entire IR into a tagged architecture
>> 
>> 
>> Is this really the way to go ?
>> 
>> It seems like a odd choice given that none of our current targets
>> are tagged architectures, all of this tagged IR has to somehow be
>> reduced back down to normal target machine instructions.
> 
> No architecture (that I know of) supports PHI nodes either, yet they
> are an enormously useful concept.  Comparing LLVM IR to machine ISAs
> is a mistake -- the mid level IR should be evaluated in terms of how
> well it can do what it is supposed to do, which is enable mid level
> optimizations.  There's a reason why we don't use MCInsts all the
way
> through.
> 
> Thanks!
> -- Sanjoy

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jun 2017 - the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

[llvm-dev] A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

Reasonably Related Threads