thr3ads.net - llvm dev - [LLVMdev] GlobalsModRef (and thus LTO) is completely broken [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2015-Jul-14 05:37 UTC

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

On Mon, Jul 13, 2015 at 10:21 PM Chris Lattner <clattner at apple.com>
wrote:
>
> > On Jul 13, 2015, at 8:19 PM, Chandler Carruth <chandlerc at
gmail.com>
> wrote:
> >
> > Ok folks,
> >
> > I wrote up the general high-level thoughts I have about stateful AA in
a
> separate thread. But we need to sort out the completely and horribly broken
> aspects of GlobalsModRef today, and the practical steps forward. This email
> is totally about the practical stuff.
> >
> > Now, as to why I emailed this group of people and with this subject,
the
> only pass pipeline that includes GlobalsModRef, is the LTO pipeline.
>
> Ah, so it is just an LTO enabled benchmark hack then.
>
> > It is also relying on a precomputed set of global variables whose
> address is never used by an instruction other than some very small set
> (gep, bitcast) as "non-address-taken". It then runs
GetUnderlyingObject on
> the two pointers in alias queries, and if that finds one of these
> "non-address-taken" globals for one of the memory locations but
not the
> other, it concludes no-alias! This is broken for a number of reasons.
> >
> > a) If the two locations merely have a different *depth* of instruction
> stack, because GetUnderlyingObject has a recursion cap, one side can fail
> while the other succeeds, and we erroneously produce no-alias.
>
> Interesting.  I’m sure it is no consolation, but GlobalsModRef probably
> predated the recursion cap :-)
>
> > b) If instcombine or any other pass for any reason introduces on one
> path an instruction that GetUnderlyingObject can't look through
(select,
> phi, load, ....), we incorrectly conclude no-alias. This is what
> addEscapingUse was intended to solve, but we would literally have to call
> it from every pass because we can't rely on analysis invalidation!
> >
> > c) If any pass actually escapes a pointer from one function into
> another, we invalidate the underlying assumption of
'non-address-taken'
> that it relies upon.
>
> Yep, all of these are pretty nasty.
>
> > Now, as I argued in my general AA thread, I think we might be able to
> assume that (c) doesn't happen today. But both (a) and (b) seem like
active
> nightmares to try to fix. I can see hacky ways to avoid (a) where we detect
> *why* GetUnderlyingObject fails, but I don't see how to fix both (a)
and
> (b) (or to fix (a) well) without just disabling this specific aspect of
> GloblasModRef.
>
> Ok, but we need performance information to make sure this doesn’t cause a
> regression in practice for LTO builds.  For example, Spec 2K and 2K6 are a
> reasonable place to start.
>
> > 1) Fix obvious issues with GloblasModRef and switch it to ValueHandles
> > 2) Mail out a patch to disable this part of GlobalsModRef. I can put
it
> behind a flag or however folks would like it to work.
> > 3) Remove addEscapingUse() update API, which without #2 may regress
some
> LTO test case I don't have (because I don't have any other than
bootstrap)
>
> Sounds great if we can do this without causing a regression in practice.
> Are you aware of any miscompiles that might be attributed to this, or are
> these “theoretical" concerns?
>
I don't really have any way of benchmarking SPEC with LTO. If that's a
configuration that is super important to people, I'm hoping they'll let
me
know about any particular performance issues.

Ultimately, I'm happy making no performance impacting changes here. The
thing is, I need to remove a broken abstraction (addEscapingUse) that was
half-way added without any test case (literally, I have no test case that
fails if I delete it. The regression tests don't fail if I assert(0) in
it). =/ I don't want to do that if it causes miscompiles for folks that
*are* relying on LTO. So my plan is to essentially post the patch that will
fix the miscompiles but may regress performance. If the miscompiles show up
for those relying on LTO and this (somewhat unsound) pass, they are in the
best position to evaluate the cost/benefit of a performance reducing patch.

Honsetly, my hope is that it won't even impact performance. But I have no
realistic way to measure it in any useful way. And for *my* platforms, I
would be perfectly happy to trade some performance for correctness here.
That's one of the reasons we're not relying on LTO yet. But I understand
that others in the community have more strict performance needs and so I'm
doing what I can to give them options.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150714/2ca6df08/attachment.html>

Gerolf Hoflehner

2015-Jul-14 19:02 UTC

head link

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

> On Jul 13, 2015, at 10:37 PM, Chandler Carruth <chandlerc at
gmail.com> wrote:
> 
> On Mon, Jul 13, 2015 at 10:21 PM Chris Lattner <clattner at apple.com
<mailto:clattner at apple.com>> wrote:
> 
> > On Jul 13, 2015, at 8:19 PM, Chandler Carruth <chandlerc at
gmail.com <mailto:chandlerc at gmail.com>> wrote:
> >
> > Ok folks,
> >
> > I wrote up the general high-level thoughts I have about stateful AA in
a separate thread. But we need to sort out the completely and horribly broken
aspects of GlobalsModRef today, and the practical steps forward. This email is
totally about the practical stuff.
> >
> > Now, as to why I emailed this group of people and with this subject,
the only pass pipeline that includes GlobalsModRef, is the LTO pipeline.
> 
> Ah, so it is just an LTO enabled benchmark hack then.
> 
> > It is also relying on a precomputed set of global variables whose
address is never used by an instruction other than some very small set (gep,
bitcast) as "non-address-taken". It then runs GetUnderlyingObject on
the two pointers in alias queries, and if that finds one of these
"non-address-taken" globals for one of the memory locations but not
the other, it concludes no-alias! This is broken for a number of reasons.
> >
> > a) If the two locations merely have a different *depth* of instruction
stack, because GetUnderlyingObject has a recursion cap, one side can fail while
the other succeeds, and we erroneously produce no-alias.
> 
> Interesting.  I’m sure it is no consolation, but GlobalsModRef probably
predated the recursion cap :-)Why not return the conservative result when the cap is
hit?> 
> > b) If instcombine or any other pass for any reason introduces on one
path an instruction that GetUnderlyingObject can't look through (select,
phi, load, ....), we incorrectly conclude no-alias. This is what addEscapingUse
was intended to solve, but we would literally have to call it from every pass
because we can't rely on analysis invalidation!
Dito> >
> > c) If any pass actually escapes a pointer from one function into
another, we invalidate the underlying assumption of 'non-address-taken'
that it relies upon.
> Yep, all of these are pretty nasty.
> 
> > Now, as I argued in my general AA thread, I think we might be able to
assume that (c) doesn't happen today. But both (a) and (b) seem like active
nightmares to try to fix. I can see hacky ways to avoid (a) where we detect
*why* GetUnderlyingObject fails, but I don't see how to fix both (a) and (b)
(or to fix (a) well) without just disabling this specific aspect of
GloblasModRef.
> 
> Ok, but we need performance information to make sure this doesn’t cause a
regression in practice for LTO builds.  For example, Spec 2K and 2K6 are a
reasonable place to start.
> 
> > 1) Fix obvious issues with GloblasModRef and switch it to ValueHandles
> > 2) Mail out a patch to disable this part of GlobalsModRef. I can put
it behind a flag or however folks would like it to work.
> > 3) Remove addEscapingUse() update API, which without #2 may regress
some LTO test case I don't have (because I don't have any other than
bootstrap)
> 
> Sounds great if we can do this without causing a regression in practice. 
Are you aware of any miscompiles that might be attributed to this, or are these
“theoretical" concerns?
> 
> I don't really have any way of benchmarking SPEC with LTO. If
that's a configuration that is super important to people, I'm hoping
they'll let me know about any particular performance issues.
We’ll get that data for ARM.> 
> Ultimately, I'm happy making no performance impacting changes here. The
thing is, I need to remove a broken abstraction (addEscapingUse) that was
half-way added without any test case (literally, I have no test case that fails
if I delete it. The regression tests don't fail if I assert(0) in it). =/ I
don't want to do that if it causes miscompiles for folks that *are* relying
on LTO. So my plan is to essentially post the patch that will fix the
miscompiles but may regress performance. If the miscompiles show up for those
relying on LTO and this (somewhat unsound) pass, they are in the best position
to evaluate the cost/benefit of a performance reducing patch.
> 
> Honsetly, my hope is that it won't even impact performance. But I have
no realistic way to measure it in any useful way. And for *my* platforms, I
would be perfectly happy to trade some performance for correctness here.
That's one of the reasons we're not relying on LTO yet. But I understand
that others in the community have more strict performance needs and so I'm
doing what I can to give them options.I wouldn’t be willing to give up performance for hypothetical issues. Please
protect all your changes with options. For some of your concerns it is probably
hard to provide a test case that shows an/the actual
issue.> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150714/cd02eb63/attachment.html>

Xinliang David Li

2015-Jul-14 19:40 UTC

head link

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

On Tue, Jul 14, 2015 at 12:02 PM, Gerolf Hoflehner <ghoflehner at
apple.com>
wrote:
>
> On Jul 13, 2015, at 10:37 PM, Chandler Carruth <chandlerc at
gmail.com>
> wrote:
>
> On Mon, Jul 13, 2015 at 10:21 PM Chris Lattner <clattner at
apple.com> wrote:
>
>>
>> > On Jul 13, 2015, at 8:19 PM, Chandler Carruth <chandlerc at
gmail.com>
>> wrote:
>> >
>> > Ok folks,
>> >
>> > I wrote up the general high-level thoughts I have about stateful
AA in
>> a separate thread. But we need to sort out the completely and horribly
>> broken aspects of GlobalsModRef today, and the practical steps forward.
>> This email is totally about the practical stuff.
>> >
>> > Now, as to why I emailed this group of people and with this
subject,
>> the only pass pipeline that includes GlobalsModRef, is the LTO
pipeline.
>>
>> Ah, so it is just an LTO enabled benchmark hack then.
>>
>> > It is also relying on a precomputed set of global variables whose
>> address is never used by an instruction other than some very small set
>> (gep, bitcast) as "non-address-taken". It then runs
GetUnderlyingObject on
>> the two pointers in alias queries, and if that finds one of these
>> "non-address-taken" globals for one of the memory locations
but not the
>> other, it concludes no-alias! This is broken for a number of reasons.
>> >
>> > a) If the two locations merely have a different *depth* of
instruction
>> stack, because GetUnderlyingObject has a recursion cap, one side can
fail
>> while the other succeeds, and we erroneously produce no-alias.
>>
>> Interesting.  I’m sure it is no consolation, but GlobalsModRef probably
>> predated the recursion cap :-)
>>
> Why not return the conservative result when the cap is hit?
>
>
>> > b) If instcombine or any other pass for any reason introduces on
one
>> path an instruction that GetUnderlyingObject can't look through
(select,
>> phi, load, ....), we incorrectly conclude no-alias. This is what
>> addEscapingUse was intended to solve, but we would literally have to
call
>> it from every pass because we can't rely on analysis invalidation!
>>
> Dito
>
That will make it too conservative. The underlying assumption is that if
you can not track the origin of a pointer, the pointer can not point to a
non-address taken global var.  The real question is whether a
transformation can turn a tractable access to a non-address taken global
into an unanalyzable form ?

David


>
> >
>> > c) If any pass actually escapes a pointer from one function into
>> another, we invalidate the underlying assumption of
'non-address-taken'
>> that it relies upon.
>
> Yep, all of these are pretty nasty.
>>
>> > Now, as I argued in my general AA thread, I think we might be able
to
>> assume that (c) doesn't happen today. But both (a) and (b) seem
like active
>> nightmares to try to fix. I can see hacky ways to avoid (a) where we
detect
>> *why* GetUnderlyingObject fails, but I don't see how to fix both
(a) and
>> (b) (or to fix (a) well) without just disabling this specific aspect of
>> GloblasModRef.
>>
>> Ok, but we need performance information to make sure this doesn’t cause
a
>> regression in practice for LTO builds.  For example, Spec 2K and 2K6
are a
>> reasonable place to start.
>>
>> > 1) Fix obvious issues with GloblasModRef and switch it to
ValueHandles
>> > 2) Mail out a patch to disable this part of GlobalsModRef. I can
put it
>> behind a flag or however folks would like it to work.
>> > 3) Remove addEscapingUse() update API, which without #2 may
regress
>> some LTO test case I don't have (because I don't have any other
than
>> bootstrap)
>>
>> Sounds great if we can do this without causing a regression in
practice.
>> Are you aware of any miscompiles that might be attributed to this, or
are
>> these “theoretical" concerns?
>>
>
> I don't really have any way of benchmarking SPEC with LTO. If
that's a
> configuration that is super important to people, I'm hoping they'll
let me
> know about any particular performance issues.
>
>
> We’ll get that data for ARM.
>
>
> Ultimately, I'm happy making no performance impacting changes here. The
> thing is, I need to remove a broken abstraction (addEscapingUse) that was
> half-way added without any test case (literally, I have no test case that
> fails if I delete it. The regression tests don't fail if I assert(0) in
> it). =/ I don't want to do that if it causes miscompiles for folks that
> *are* relying on LTO. So my plan is to essentially post the patch that will
> fix the miscompiles but may regress performance. If the miscompiles show up
> for those relying on LTO and this (somewhat unsound) pass, they are in the
> best position to evaluate the cost/benefit of a performance reducing patch.
>
>
> Honsetly, my hope is that it won't even impact performance. But I have
no
> realistic way to measure it in any useful way. And for *my* platforms, I
> would be perfectly happy to trade some performance for correctness here.
> That's one of the reasons we're not relying on LTO yet. But I
understand
> that others in the community have more strict performance needs and so
I'm
> doing what I can to give them options.
>
> I wouldn’t be willing to give up performance for hypothetical issues.
> Please protect all your changes with options. For some of your concerns it
> is probably hard to provide a test case that shows an/the actual issue.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150714/fe53bc29/attachment.html>

Chandler Carruth

2015-Jul-15 06:25 UTC

head link

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

Replying here, but several of the questions raised boil down to
"couldn't
you make the usage of GetUnderlyingObject conservatively correct?".
I'll
try and address that.

I think this *is* the right approach, but I think it is very hard to do
without effectively disabling this part of GlobalsModRef. That is, the easy
ways are likely to fire very frequently IMO.

The core idea is to detect a "no information" state coming out of
GetUnderlyingObject (likely be providing a custom version just for
GlobalsModRef and tailored to its use cases). This is particularly
effective at avoiding the problems with the recursion limit. But let's look
at what cases we *wouldn't* return that. Here are the cases I see when I
thought about this last night with Hal, roughly in descending likelihood I
would guess:

1) We detect some global or an alloca. In that case, even BasicAA would be
sufficient to provide no-alias. GMR shouldn't be relevant.

2) We detect a phi, select, or inttoptr, and stop there.

3) We detect a load and stop there.

4) We detect a return from a function.

5) We detect an argument to the function.

I strongly suspect the vast majority of queries hit #1. That's why BasicAA
is *so* effective. Both #4 and #5 I think are actually reasonable places
for GMR to potentially say "no-alias" and provide useful definitive
information. But I also suspect these are the least common.

So let's look at #2 and #3 because I think they're interesting. For
these,
I think it is extremely hard to return "no-alias". It seems extremely
easy
for a reasonable and innocuous change to the IR to introduce a phi or a
select into one side of the GetUnderlyingObject but not the other. If that
ever happens, we can't return "no-alias" for #2, or we need to add
really
expensive updates. It also seems reasonable (if not as likely) to want
adding a store and load to the IR to not trigger a miscompile. If it is
possible for a valid optimization pass to do reg2mem on an SSA value, then
that could happen to only one side of the paired GetUnderlyingObject and
 break GMR with #3. If that seems like an unreasonable thing to do,
consider loop re-rolling or other transformations which may need to take
things in SSA form at move them out of SSA form. Even if we then try
immediately to put it back *into* SSA form, before we do that we create a
point where GMR cannot correctly return no-alias.

So ultimately, I don't think we want to rely on GMR returning
"no-alias"
for either #2 or #3 because of the challenge of actually updating it in all
of the circumstances that could break them. That means that *only* #4 and
#5 are going to return "no-alias" usefully. And even then, function
inlining and function outlining both break #4 and #5, so you have to
preclude those transforms while GMR is active. And I have serious doubts
about these providing enough value to be worth the cost.

I think the better way to approach this is the other way around. Rather
than doing a backwards analysis to see if one location reaches and global
and the other location doesn't reach a global, I think it would be much
more effective to re-cast this as a forward analysis that determines all
the memory locations in a function that come from outside the function, and
use that to drive the no-alias responses.

On Tue, Jul 14, 2015 at 12:12 PM Gerolf Hoflehner <ghoflehner at
apple.com>
wrote:
> I wouldn’t be willing to give up performance for hypothetical issues.
> Please protect all your changes with options. For some of your concerns it
> is probably hard to provide a test case that shows an/the actual issue.
>
I certainly agree that it will be very hard to provide a test case and
extremely rare to see this in the wild for most of these issues. As long as
I can remove the problematic update API we currently have (which as its an
API change can't really be put behind flags), I'm happy to have flags
control whether or not GMR uses the unsound / stale information to try to
answer alias queries. Do you have any opinion about what the default value
of the flags should be?

I'll go ahead and prepare the patches, as it seems like we're all ending
up
in the same position, and just wondering about the precise tradeoffs we
want to settle on.

> _______________________________________________
> LLVM Developers mailing list
>
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/805ee4b8/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jul 2015 - [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

[LLVMdev] GlobalsModRef (and thus LTO) is completely broken

Possibly Parallel Threads