thr3ads.net - llvm dev - [llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Peter Lawrence via llvm-dev

2017-Jun-19 15:36 UTC

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

> On Jun 16, 2017, at 8:23 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi Peter,
> 
> On Tue, Jun 13, 2017 at 10:27 AM, Peter Lawrence via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Here’s what seems to really be going on
>> 
>>        “undef”  ===  models an uninitialized register,  but
> 
> No, it specifically does not.  You yourself have mentioned the reason
> many times -- every "read" of undef returns a new value, which is
> different from, say, %rax with garbage in it.
> 

Sanjoy,
            You’ve hit the nail on the head !, but I don’t think we’ve correctly
identified the root cause of the problem yet.

Rather the problem is with how we incorrectly cse and copy-propagate it:
	x = undef
	if (x == x)
	—————>    this is an illegal copy-propagation   —————>
	if (undef == undef)

If you don’t believe this is illegal then we end up with the absurdity of the
function-inlining example [1], and the argument against the function-inlining
example is so compelling that John decided to drop out of the argument,
IE he gave up because it is indefensible.

Apparently this copy-propagation has been justified by thinking of
"undef"
as an IR "constant” and that any optimization you can do to a “constant”
can also be done to “undef” without further thought.

Instead each ‘undef’ should be thought of as a live on entry register, IE an
incoming argument physical register, and “x = undef” cannot be optimized
any more than “x = PhysReg0”, in particular multiple uses of X does not mean
multiple incoming argument registers, and separate instances of “undef”
cannot be equated any more than distinct incoming argument registers.

To the argument that this may create unnecessary register pressure I say
that is a register allocator issue not an IR issue, the register allocator can
and should figure this out and do the right thing.

Peter Lawrence.

[1. this function *always* executes statement S,
	F(a) {
	   If (a == a) S;
	}
   but in llvm if you inline it and “a” happens to be “undef” then nothing can
   be said about whether statement S is executed.  This is indefensible.]

>>        “poison”  ===  turns the entire IR into a tagged architecture
>> 
>> 
>> Is this really the way to go ?
>> 
>> It seems like a odd choice given that none of our current targets
>> are tagged architectures, all of this tagged IR has to somehow be
>> reduced back down to normal target machine instructions.
> 
> No architecture (that I know of) supports PHI nodes either, yet they
> are an enormously useful concept.  Comparing LLVM IR to machine ISAs
> is a mistake -- the mid level IR should be evaluated in terms of how
> well it can do what it is supposed to do, which is enable mid level
> optimizations.  There's a reason why we don't use MCInsts all the
way
> through.
> 
> Thanks!
> -- Sanjoy

Sanjoy Das via llvm-dev

2017-Jun-19 16:35 UTC

head link

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

Hi Peter,

On Mon, Jun 19, 2017 at 8:36 AM, Peter Lawrence
<peterl95124 at sbcglobal.net> wrote:>             You’ve hit the nail on the head !, but I don’t think we’ve
correctly
> identified the root cause of the problem yet.
>
> Rather the problem is with how we incorrectly cse and copy-propagate it:
>         x = undef
>         if (x == x)
>         —————>    this is an illegal copy-propagation   —————>
>         if (undef == undef)
>
> If you don’t believe this is illegal then we end up with the absurdity of
the
> function-inlining example [1], and the argument against the
function-inlining
I don't think [1] is absurd.  LLVM IR is not a programming language,
and it is okay for it to have semantics that would seem odd in a
programming language.
> example is so compelling that John decided to drop out of the argument,
> IE he gave up because it is indefensible.
I think he "gave up" because he has better things to do than argue
with you. :)
> Apparently this copy-propagation has been justified by thinking of
"undef"
> as an IR "constant” and that any optimization you can do to a
“constant”
> can also be done to “undef” without further thought.
>
> Instead each ‘undef’ should be thought of as a live on entry register, IE
an
Since you're talking about "each" undef, I presume you want to
have an
undef instruction?  As I've said before, this would be equivalent to
"freeze(poison)" in the new semantics.  It does not address the
problem of optimizing things like `a s< (a +nsw 1)` (so you'll need a
poison-like thing for that anyway).
> incoming argument physical register, and “x = undef” cannot be optimized
> any more than “x = PhysReg0”, in particular multiple uses of X does not
mean
> multiple incoming argument registers, and separate instances of “undef”
> cannot be equated any more than distinct incoming argument registers.
>
> To the argument that this may create unnecessary register pressure I say
> that is a register allocator issue not an IR issue, the register allocator
can
> and should figure this out and do the right thing.
Sure, that's a consistent proposal.  However, practically, the onus is
on you to prove this is reasonable from a performance standpoint.

-- Sanjoy

Peter Lawrence via llvm-dev

2017-Jun-19 17:02 UTC

head link

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

Sanjoy,
            The point is this, you have to take a stand one way or
the other on the function-inlining issue:


[1. this function *always* executes statement S,
	F(a) {
	   If (a == a) S;
	}
   but in llvm if you inline it and “a” happens to be “undef” then nothing can
   be said about whether statement S is executed. This is indefensible.]


My belief is this: that llvm exists for a utilitarian purpose,
and that llvm currently violates that utilitarian goal by violating
the users expectations in the function-inlining example.


So the question is, where do you stand ?


Peter Lawrence.


> On Jun 19, 2017, at 9:35 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> Hi Peter,
> 
> On Mon, Jun 19, 2017 at 8:36 AM, Peter Lawrence
> <peterl95124 at sbcglobal.net> wrote:
>>            You’ve hit the nail on the head !, but I don’t think we’ve
correctly
>> identified the root cause of the problem yet.
>> 
>> Rather the problem is with how we incorrectly cse and copy-propagate
it:
>>        x = undef
>>        if (x == x)
>>        —————>    this is an illegal copy-propagation   —————>
>>        if (undef == undef)
>> 
>> If you don’t believe this is illegal then we end up with the absurdity
of the
>> function-inlining example [1], and the argument against the
function-inlining
> 
> I don't think [1] is absurd.  LLVM IR is not a programming language,
> and it is okay for it to have semantics that would seem odd in a
> programming language.
> 
>> example is so compelling that John decided to drop out of the argument,
>> IE he gave up because it is indefensible.
> 
> I think he "gave up" because he has better things to do than
argue with you. :)
> 
>> Apparently this copy-propagation has been justified by thinking of
"undef"
>> as an IR "constant” and that any optimization you can do to a
“constant”
>> can also be done to “undef” without further thought.
>> 
>> Instead each ‘undef’ should be thought of as a live on entry register,
IE an
> 
> Since you're talking about "each" undef, I presume you want
to have an
> undef instruction?  As I've said before, this would be equivalent to
> "freeze(poison)" in the new semantics.  It does not address the
> problem of optimizing things like `a s< (a +nsw 1)` (so you'll need
a
> poison-like thing for that anyway).
> 
>> incoming argument physical register, and “x = undef” cannot be
optimized
>> any more than “x = PhysReg0”, in particular multiple uses of X does not
mean
>> multiple incoming argument registers, and separate instances of “undef”
>> cannot be equated any more than distinct incoming argument registers.
>> 
>> To the argument that this may create unnecessary register pressure I
say
>> that is a register allocator issue not an IR issue, the register
allocator can
>> and should figure this out and do the right thing.
> 
> Sure, that's a consistent proposal.  However, practically, the onus is
> on you to prove this is reasonable from a performance standpoint.
> 
> -- Sanjoy

John Regehr via llvm-dev

2017-Jun-20 04:28 UTC

head link

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

> ... the argument against the function-inlining
> example is so compelling that John decided to drop out of the argument,
> IE he gave up because it is indefensible.
Peter, it is extremely impolite to mischaracterize someone else's 
position.  Up to this point I assumed that you were operating in good 
faith but what I quoted above looks more like trolling.

John

Peter Lawrence via llvm-dev

2017-Jun-20 06:59 UTC

head link

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

John,
         this is the issue at hand

[1. this function *always* executes statement S,
	F(a) {
	   If (a == a) S;
	}
   but in llvm if you inline it and “a” happens to be “undef” then nothing can
   be said about whether statement S is executed. This is indefensible.]

I am trying to force this issue because when I challenge folks to question
their assumptions regarding CSE’ing and CP’ing of “undef” all I get back
is a recitation of the usual catechism on the subject. I can’t get them
to think outside the box without this example of the consequences
of those assumptions. Unfortunately so far no one has responded to it.

The VP of engineering at my last company had a way of bringing
clarity to issues like this, he said now that robotic surgery is
becoming a reality do you want the compiler to do this if it is being
used to compile the robot software and you are scheduled for
surgery.

I also believe that the C-standard is flawed in that it uses the
term “undefined behavior” in many place where it should only
use “unspecified”, and that reading the letter rather than the spirit 
of the standard is doing a disservice to our users.  We should not
look to the standard for guidance, rather the llvm community
needs to take a stand and guide the standard.

Will you help in challenging folks to question their assumptions ?
Will you help guide the standard ?
What say you ?

Peter Lawrence.

> On Jun 19, 2017, at 9:28 PM, John Regehr <regehr at cs.utah.edu>
wrote:
> 
>> ... the argument against the function-inlining
>> example is so compelling that John decided to drop out of the argument,
>> IE he gave up because it is indefensible.
> 
> Peter, it is extremely impolite to mischaracterize someone else's
position.  Up to this point I assumed that you were operating in good faith but
what I quoted above looks more like trolling.
> 
> John
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170619/dd7616b4/attachment.html>

llvm dev - Jun 2017 - the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room

[llvm-dev] the root cause is CP, was: A tagged architecture, the elephant in the undef / poison room