thr3ads.net - llvm dev - [LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2015-Feb-24 04:45 UTC

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com>
> To: "Katya Romanova" <Katya_Romanova at
playstation.sony.com>, "Nick Lewycky" <nlewycky at
google.com>
> Cc: llvmdev at cs.uiuc.edu, "Philip Reames" <listmail at
philipreames.com>, "Hal Finkel" <hfinkel at anl.gov>,
"Rafael
> Espindola" <rafael_espindola at playstation.sony.com>,
"Sanjoy Das" <sanjoy at playingwithpointers.com>, "David
Majnemer"
> <david.majnemer at gmail.com>
> Sent: Monday, February 23, 2015 10:32:30 PM
> Subject: Re: Jump Theading/GVN bug - moving discussion to llvm-dev
> 
> 
> So, first things first.
> 
> 
> Handling unreachable code is annoying. I agree. However, its
> important to characterize how annoying. Most code is not
> unreachable, and we're (obviously) fine just bailing on such crazy
> code paths. So in practice the common cost is keeping a local set to
> detect cycles in the graph where we aren't expecting them. We are
> essentially always traversing a linked list representing this graph.
> Because these linked list nodes don't have any particular reason to
> have high locality, we have a cache-hostile traversal where we have
> to do primarily local and cache friendly book-keeping at each step
> to catch duplication. I suspect that this has a very small (but
> non-zero) impact on compile time.
> 
> 
> Handling unreachable code is also error prone. We have had a long
> history of bugs here. So if it were free to get rid of unreachable
> code, that would be super awesome.
> 
> 
> 
> 
> The first problem to realize is that for a large chunk of LLVM, we
> can't actually get rid of handling unreachable code. Passes are
> quite likely to create unreachable code while running, and that will
> mean that all of the utilities will end up needing to be
> conservatively correct and handle unreachable code even when they
> don't need to. =/
> 
> 
> 
> 
> The second problem is that cleaning up unreachable code isn't free.
> The code which is most likely to create unreachable code is
> similarly likely to destroy the analysis information we would use to
> remove it cheaply. And even then, just walking lists of basic blocks
> as we go out of the function is going to dirty cache lines that we
> might not have any other reason to look at. I can genuinely imagine
> cases where batching this will be beneficial. Will it outstrip the
> cost of handling it? I don't know. But I think it will mitigate the
> gains, especially if the gains aren't as significant as we might
> naively think.
> 
> 
> 
> 
> The third problem I have is that this makes it much harder to
> constructively produce a correct IR transformation pass. When
> transforming the IR we must never forget about regions we have made
> unreachable. A single mistake here will cascade to a verification
> failure. This is the reverse of the problem we have today where your
> pass must *accept* unreachable IR. But I think the constructive
> reasoning is much easier. It makes it harder to have a bug through
> omission of a case.
> 
> 
> 
> 
> The fourth problem I have is related to the third problem. How do we
> gain confidence in the correctness of any IR transformation? We must
> construct test cases that we believe will exercise the system. It
> seems *incredibly* easier to construct test cases with interesting
> unreachable IR patterns that should all be handled, than to
> construct test cases where the pass will happen to form unreachable
> IR patterns in each way and ensure that none escape the pass. One is
> essentially covering input diversity, the other has to cover
> *output* diversity, and must prove the negative. We fundamentally
> must gain confidence that the pass *never* produces unreachable IR.
> This seems much harder than demonstrating that it does handle
> unreachable IR.
> 
> 
> 
> 
> The fifth problem I have is also related to the third and fourth
> problems. Precluding unreachable code seems likely to make tools
> like delta test case reduction dramatically less effective. Now,
> when cutting edges in the CFG to minimize a test case they must
> actually build the graph and prune all unreachable code.
> 
> All of these problems seem surmountable, but in aggregate, they make
> me strongly doubt that the benefit is worth the cost.
I don't disagree with this, but I think there are two (separable) issues
here:

 1. Should transformations produce unreachable code? On the face of it, they
must be able to, because the problem of determining dynamic reachability is
generally undecidable. Should passes produce trivially dead code? Well, probably
not if they avoid it. Even this is not locally decidable, and so you're
right, removing it will add extra expense (but also provide savings, so we'd
need to experiment).

 2. Should unreachable code be allowed to contain nonsense (like instructions
that depend on themselves)? To this, I believe the answer is no. We currently
permit this, and I think that a lot of the bugs regarding unreachable code some
from this. I've yet to hear a good argument why, for example, JumpThreading
must produce self-referential instructions in trivially-dead regions.

 -Hal
> 
> 
> -Chandler
> 
> 
> 
> On Mon, Feb 23, 2015 at 1:04 PM, Romanova, Katya <
> Katya_Romanova at playstation.sony.com > wrote:
> 
> 
> 
> 
> 
> 
> Hello,
> 
> I encountered a problem triggered by Jump-Threading optimization.
> This pass is creating an unreachable block with an instruction that
> is not well formed, which then causes the subsequent GVN pass to
> enter an infinite loop.
> 
> 
> 
> I have submitted a bug report and proposed fix to llvm-commits. This
> bug opened a can of worms. I was asked to move the discussion to
> llvm-dev to reach for a wider audience.
> 
> 
> 
> >> Can we move the general discussion to llvm-dev? This probably
> >> warrants a wider audience.
> 
> 
> 
> Here is the original post and a couple of replies from last week:
> 
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150216/261330.html
> 
> 
> 
> Here is a thread of replies from today:
> 
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150223/261463.html
> 
> 
> 
> Katya.
> 
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Chandler Carruth

2015-Feb-24 07:05 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

Will try to reply to the larger thread again soon, but a quick reply on a
bit of a tangent...

On Mon, Feb 23, 2015 at 8:45 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>  2. Should unreachable code be allowed to contain nonsense (like
> instructions that depend on themselves)? To this, I believe the answer is
> no. We currently permit this, and I think that a lot of the bugs regarding
> unreachable code some from this. I've yet to hear a good argument why,
for
> example, JumpThreading must produce self-referential instructions in
> trivially-dead regions.
>
I could easily see tightening the rules on what passes the verifier within
unreachable code to try to make it substantially cheaper, easier, and less
error prone to handle unreachable code. It seems likely that there are good
pragmatic compromises here that would actually cost nothing. Obvious
example is, as you highlight, self-referential non-phi instructions could
easily just be rejected, and asserted during construction by the IR builder.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150223/e565145f/attachment.html>

Daniel Berlin

2015-Feb-24 16:27 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

On Mon, Feb 23, 2015 at 11:05 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
> Will try to reply to the larger thread again soon, but a quick reply on a
> bit of a tangent...
>
> On Mon, Feb 23, 2015 at 8:45 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>
>>  2. Should unreachable code be allowed to contain nonsense (like
>> instructions that depend on themselves)? To this, I believe the answer
is
>> no. We currently permit this, and I think that a lot of the bugs
regarding
>> unreachable code some from this. I've yet to hear a good argument
why, for
>> example, JumpThreading must produce self-referential instructions in
>> trivially-dead regions.
>>
>
> I could easily see tightening the rules on what passes the verifier within
> unreachable code to try to make it substantially cheaper, easier, and less
> error prone to handle unreachable code. It seems likely that there are good
> pragmatic compromises here that would actually cost nothing.
>

One such compromise (which you didn't seem fond of) was that
"unreachable
blocks have to contain nothing but a terminator".

Earlier, you said "The code which is most likely to create unreachable code
is similarly likely to destroy the analysis information we would use to
remove it cheaply. " and "When transforming the IR we must never
forget
about regions we have made unreachable. "

I would like to challenge this assertion with actual evidence:
Right now the state of the world is actually "the vast majority of llvm
passes try to clean up after themselves, try not to create weird
unreachable code".

GVN removes dead blocks that replacement of values/equality propagation
creates
SCCP deletes all instructions in dead blocks except terminators (because it
preserves CFG)
ADCE does the same (it never removes terminators)
DCE the same
IPSCCP actually removes dead blocks, because it does not preserve CFG
JumpThreading removes unreachable blocks on input but not output (and in
fact, as you'll see, it's the only pass that seems to really be a
problem)
SROA cleans up unreachable code it generates in it's utility class
SimplifyCFG removes unreachable blocks
CorrelatedValuePropagation does the same through a utility (it calls
constantfoldterminator, which in turn takes great pains to clean up the CFG
properly)
etc
I didn't look at the loop passes, but outside of jump threading, it appears
there is little work to be done to be in a state where things clean up
after themselves.

That is, *right* now, we know for at least a ton of the passes and their
analysis, it is not "destroying the analysis information we would use to
remove it" because they are in fact removing it.
They already "do not forget about regions they have made unreachable",
because they already clean up after themselves.

For giggle, i stopped two of them that i happened to know pretty well (GVN,
CorrelatedValuePropagation) from removing unreachable blocks, just leaving
them unreachable with instructions still in them.
While it does nothing to cause verifier crashes, it causes a *bunch* of
brand new crashes *in* passes in the standard pass ordering, which it
wouldn't if we were really living in a world where passes were really
handling unreachable code properly already.

Regardless of what the outcome is (and i'm pretty clearly on one side), we
shouldn't go half-way. Right now we are already in a world, where, because
we have some passes cleaning up and some not,we have passes that can't
handle unreachable code as input, and some that can.
So if you change the pass order, and those passes do something, you get
wonderful crashes.
Either we should be testing passes with unreachable code as input, or
expecting their not to be any.
(and the side-issue of "in a world where anything can be in an unreachable
block, all of this cleanup and carefulness is a waste of time")

Obvious example is, as you highlight, self-referential non-phi
instructions> could easily just be rejected, and asserted during construction by the IR
> builder.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/b9c50d08/attachment.html>

Smith, Kevin B

2015-Feb-24 18:26 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

FWIW, since I am not really much of a contributor to LLVM.

The points Chandler makes for why passes must be allowed to create dead code are
excellent.  Dead code elimination is a general compiler cleanup
pass, do be run once in a while to get rid of code that for whatever reasons is
dead.  Every pass should not have to police themselves not to produce dead code.
This strategy (of having independent passes to do well-defined specific things)
is pretty much (it seems to me) one of the founding principles of LLVM design,
and one of its very key strengths.  Dead code elimination should not be built
into all other passes.

Hal gets to the heart of the matter (in my opinion) when he writes:

Ø   2. Should unreachable code be allowed to contain nonsense (like instructions
that depend on themselves)? To this, I believe the answer is no. We currently
permit this, and I think that a lot of the bugs regarding unreachable code some
from this. I've yet to hear a good argument why, for example, JumpThreading
must produce self-referential instructions in trivially-dead regions.

To me this looks like an example of incorrect SSA IR.  SSA form is not supposed
to allow for the case of a use of a def where the def cannot reach.  And my
understanding was that “undef” LLVM IR was created exactly for specifying cases
where a def was required for proper SSA form, but where none was available.
Clearly ( to me at least), an instruction cannot use in its inputs a def that
occurs later in the same basic-block (and that includes the def the instruction
itself defines).
That would either need to be an undef, or a phi that merges an undef and the def
that occurs later.

Years ago (late 90s, early 2000s) the Intel compiler had quite a few problems
with SSA form that could be traced to somewhat undisciplined handling of
“undefined” defs, PHI removal, and this type of degenerate SSA.  We did as is
being suggested currently for LLVM, and made such IR illegal, and changed
our internal verifiers to catch those.  After a short time, this shook loose
some remaining bugs with mishandling of SSA form, and improved compiler
stability
in this regard.  I think that is an excellent direction for LLVM infrastructure
to go in.

Kevin B. Smith
Intel Compiler Team

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chandler Carruth
Sent: Monday, February 23, 2015 11:05 PM
To: Hal Finkel
Cc: Katya Romanova; LLVM Developers Mailing List; Rafael Espindola
Subject: Re: [LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

Will try to reply to the larger thread again soon, but a quick reply on a bit of
a tangent...

On Mon, Feb 23, 2015 at 8:45 PM, Hal Finkel <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:
 2. Should unreachable code be allowed to contain nonsense (like instructions
that depend on themselves)? To this, I believe the answer is no. We currently
permit this, and I think that a lot of the bugs regarding unreachable code some
from this. I've yet to hear a good argument why, for example, JumpThreading
must produce self-referential instructions in trivially-dead regions.

I could easily see tightening the rules on what passes the verifier within
unreachable code to try to make it substantially cheaper, easier, and less error
prone to handle unreachable code. It seems likely that there are good pragmatic
compromises here that would actually cost nothing. Obvious example is, as you
highlight, self-referential non-phi instructions could easily just be rejected,
and asserted during construction by the IR builder.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/ab456ebc/attachment.html>

Philip Reames

2015-Feb-24 18:52 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

On 02/23/2015 08:45 PM, Hal Finkel wrote:> I don't disagree with this, but I think there are two (separable)
issues here:
>
>   1. Should transformations produce unreachable code? On the face of it,
they must be able to, because the problem of determining dynamic reachability is
generally undecidable. Should passes produce trivially dead code? Well, probably
not if they avoid it. Even this is not locally decidable, and so you're
right, removing it will add extra expense (but also provide savings, so we'd
need to experiment).
>
>   2. Should unreachable code be allowed to contain nonsense (like
instructions that depend on themselves)? To this, I believe the answer is no. We
currently permit this, and I think that a lot of the bugs regarding unreachable
code some from this. I've yet to hear a good argument why, for example,
JumpThreading must produce self-referential instructions in trivially-dead
regions.
>
>   -HalI agree with Hal here.  Fixing (2) does not necessarily involve changing 
(1).  Can someone sketch out how we'd ever get an instance of (2)?  I'd 
naively expect to at least see a PHI in the cycle.

Philip

Philip Reames

2015-Feb-24 18:57 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

On 02/24/2015 10:52 AM, Philip Reames wrote:>
> On 02/23/2015 08:45 PM, Hal Finkel wrote:
>> I don't disagree with this, but I think there are two (separable) 
>> issues here:
>>
>>   1. Should transformations produce unreachable code? On the face of 
>> it, they must be able to, because the problem of determining dynamic 
>> reachability is generally undecidable. Should passes produce 
>> trivially dead code? Well, probably not if they avoid it. Even this 
>> is not locally decidable, and so you're right, removing it will add
>> extra expense (but also provide savings, so we'd need to
experiment).
>>
>>   2. Should unreachable code be allowed to contain nonsense (like 
>> instructions that depend on themselves)? To this, I believe the 
>> answer is no. We currently permit this, and I think that a lot of the 
>> bugs regarding unreachable code some from this. I've yet to hear a 
>> good argument why, for example, JumpThreading must produce 
>> self-referential instructions in trivially-dead regions.
>>
>>   -Hal
> I agree with Hal here.  Fixing (2) does not necessarily involve 
> changing (1).  Can someone sketch out how we'd ever get an instance of 
> (2)?  I'd naively expect to at least see a PHI in the cycle.The only use of dominance information I can find in JT/LVI is for the 
handling for @llvm.assume in ValueTracking.  Adding a 
DT->isReachableFromEntry check to this path seems downright simple.

On the surface, I don't see why the LVI analysis would care about 
unreachable blocks.  It's only looking at the edges in the CFG and would 
treat an unreachable region the same as a reachable one.

Philip

Rafael Espíndola

2015-Feb-24 20:55 UTC

head link

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

On 24 February 2015 at 02:05, Chandler Carruth <chandlerc at google.com>
wrote:> Will try to reply to the larger thread again soon, but a quick reply on a
> bit of a tangent...
>
> On Mon, Feb 23, 2015 at 8:45 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>>
>>  2. Should unreachable code be allowed to contain nonsense (like
>> instructions that depend on themselves)? To this, I believe the answer
is
>> no. We currently permit this, and I think that a lot of the bugs
regarding
>> unreachable code some from this. I've yet to hear a good argument
why, for
>> example, JumpThreading must produce self-referential instructions in
>> trivially-dead regions.
>
>
> I could easily see tightening the rules on what passes the verifier within
> unreachable code to try to make it substantially cheaper, easier, and less
> error prone to handle unreachable code. It seems likely that there are good
> pragmatic compromises here that would actually cost nothing. Obvious
example
> is, as you highlight, self-referential non-phi instructions could easily
> just be rejected, and asserted during construction by the IR builder.
I am worried that the compromise might be worse than the two extremes
in this case.

I don't think that rejecting

  %a = getelementptr inbounds i8* %a, i64 1

but accepting

  %a = getelementptr inbounds i8* %b, i64 1
  %b = getelementptr inbounds i8* %a, i64 1

would get us much. It would probably just make unreachable handling
bugs harder to find and test.

I don't have a strong opinion so far, but it is interesting to note
that this particular bug does showcase arguments on both sides:

* The unreachable testcase that crashes gvn is simpler than the one
that causes unreachable code to be produced.
* Having the verifier reject unreachable code at the end of a pass
would probably have made it easier to identify and reduce the issue,
as the verifier would always detect the unreachable code but the code
has to be in a particular shape to crash gvn.

On the testcase reduction side of things, the only think I would
insist on is that bb1 still counts as reachable in

bb0:
br i1 false, label %bb1, label %bb2

Cheers,
Rafael

llvm dev - Feb 2015 - [LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev

[LLVMdev] Jump Theading/GVN bug - moving discussion to llvm-dev