thr3ads.net - llvm dev - [llvm-dev] [canonicalization] GEP 0, 0 [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin via llvm-dev

2016-Dec-24 06:17 UTC

[llvm-dev] [canonicalization] GEP 0, 0

On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at
google.com>
wrote:
> On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>>
>> 2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at
gmail.com>:
>>
>>
>>
>> On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
>> <piotr.padlewski at gmail.com> wrote:
>> >
>> >
>> > On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>> >
>> > Define soon?
>> > My guess is 1 year or less.
>> > (I've already seen patches to start converting most remaining
memdep
>> uses,
>> > like memcpy opt, licm, etc)
>> >
>> >
>> > That's good. Anyway I already have a patch that is doing
invariant group
>> > dependence across BBs, so I guess it make sense to push it
upstream to
>> push
>> > the bar higher.
>> > But I think we are getting a little bit of topic - should gep 0 be
>> > canonicalized to bitcast?
>> >
>>
>> Are memdep/memssa the only possible passes that could benefit from
>> such a canonicalization or you can think of other cases when this can
>> be useful?
>>
>>
>> We already canonicalize.  We canonicalize in the other direction:
>> https://github.com/llvm-mirror/llvm/blob/master/
>> lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024
>>
>>
>> Intresting. So what is the right solution here? I can easily add
handling
>> of gep 0 to GVN, or maybe the code that you mentioned should be in
SROA.
>> If SROA is the only user of this transformation and I there are quiet a
>> few passes that it hurts, then I would propose moving this logic to
SROA
>> and always canonicalize gep 0 to bitcast.
>>
>>
>> +1
>>
>> This canonicalization seems like it's unlikely to help other
passes, and
>> definitely makes handling more complicated elsewhere.
>>
>
> -1
> ;]
>
> I think it is very useful to be able to canonicalize on GEPs rather than
> bitcasts.
>
Based on what?

> I would teach things to understand GEPs with zero indices, it's pretty
> easy (we have tools to automate it).
>
So your position is that we should teach literally everything to understand
something new, instead of canonicalize in the direction literally
everything understands already?

>
> One reason why is that most things that want to reason about all-zero GEPs
> probably want to reason about all-constant GEPs as well.
>This is about the equivalence of gep 0,0 to bitcast. They aren't equivalent
to other types of geps, so reasoning about them seems very different and
orthogonal to me.
You are saying "well things *should* understand all constant GEP
anyway".
That may or may not be true, but it's, IMHO, pretty orthogonal to how we
choose to canonicalize a given operation.
You are also suggesting representing an operation in something
significantly more complicated than it's direct form.

Also, can you provide some data?
You say "most" and "probably" with no explicit examples.
Can you pull out an explicit example of how it would help existing passes,
both optimization, and analysis, vs canonicalizing to bitcast?

Because i'll be honest, i'm not seeing it.  It seems like a more
complicated form, for no good reason.

It would be like canonicalizing  add's to xor's.  You can do it, and
"most
things that reason about adds probably want to reason about xors".
However, i think we'd all agree that add is a much simpler and direct form,
and very directly represents the operation.
We should, IMHO, do the same here.  If something wants to see it as a GEP,
it can see it as a GEP.

Arguing that "well, passes should handle GEP anyway" is literally the
same
as arguing "well, passes should understand xor anyway".


>
>Certainly, I would expect MemDep and things like invariant.group to
survive> all-constant GEPs too.
>
> Note that even this ambiguity of which direction to canonicalize will go
> away with typeless pointers as we won't have to choose one or the
other.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161223/2ca60b67/attachment.html>

Chandler Carruth via llvm-dev

2016-Dec-24 08:39 UTC

head link

[llvm-dev] [canonicalization] GEP 0, 0

On Fri, Dec 23, 2016 at 10:17 PM Daniel Berlin <dberlin at dberlin.org>
wrote:

On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at
google.com>
wrote:

On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at gmail.com>:

On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
<piotr.padlewski at gmail.com> wrote:>
>
> On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
>
> Define soon?
> My guess is 1 year or less.
> (I've already seen patches to start converting most remaining memdep
uses,
> like memcpy opt, licm, etc)
>
>
> That's good. Anyway I already have a patch that is doing invariant
group
> dependence across BBs, so I guess it make sense to push it upstream to
push> the bar higher.
> But I think we are getting a little bit of topic - should gep 0 be
> canonicalized to bitcast?
>
Are memdep/memssa the only possible passes that could benefit from
such a canonicalization or you can think of other cases when this can
be useful?

We already canonicalize. We canonicalize in the other direction:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024

Intresting. So what is the right solution here? I can easily add handling
of gep 0 to GVN, or maybe the code that you mentioned should be in SROA.
If SROA is the only user of this transformation and I there are quiet a few
passes that it hurts, then I would propose moving this logic to SROA and
always canonicalize gep 0 to bitcast.

This canonicalization seems like it's unlikely to help other passes, and
definitely makes handling more complicated elsewhere.

-1
;]

I think it is very useful to be able to canonicalize on GEPs rather than
bitcasts.

Based on what?

TBH, no hard data at all. Just my experience poking at passes that cared
about this and the advice I received from others when hacking on them.
Sorry if anything I wrote came off as some kind of firm, or unwavering
position, it was more that I'm not sure the proposed change is good. It
might be, but I'm not sure yet. See below for more details though.

I would teach things to understand GEPs with zero indices, it's pretty easy
(we have tools to automate it).

So your position is that we should teach literally everything to understand
something new, instead of canonicalize in the direction literally
everything understands already?

Not at all.

All-zero-GEPs aren't new for better or worse. We've been canonicalizing
toward them since before Chris refactored all of this code in 2007, so this
is a very long-standing pattern even if it is a bad one. And several parts
of LLVM of course handle them. They have to given that we're canonicalizing
that direction.

I learned of this years ago when Duncan Sands taught me about it to explain
why my SROA rewrite hit so many all-zero GEPs. Since then, it hasn't seemed
to me personally like a significant issue that we canonicalize towards
all-zero GEPs. And I've not heard many folks hit issues with it since then.
So my advice is typically "yep, you need to handle all-zero GEPs".
That's
essentially the default for when a pass doesn't handle canonical form.

It isn't an endorsement though, and it doesn't mean we can't or
shouldn't
change the canonical form! If we have new evidence that makes a change
warranted, we absolutely should. Some kinds of evidence I can remember in
the past motivating a change:
1) It's really hard and/or awkward to teach things about the canonical form.
2) The canonical form doesn't express as much information or is in any
other way a less useful/expressive representation.
3) Empirical evidence shows that a different canonical form despite being
perfectly equivalent on the prior two points, is less convenient.

The statement from Piotr that it is easy to teach GVN about this seems to
indicate we're not talking about the first two issues, but if we are,
that's a whole new discussion (and an interesting one).

It seems possible that #3 is true (I suspect you think #3 is true from your
email at least). While we should in general fix these kinds of issues when
we find them, it does seem somewhat less urgent. And since there are
conflicting experiences, we probably want at least a comprehensive survey
before we make a change.

In this particular case I suspect that we *used* to have issues related to
#2 -- SROA before my rewrite relied on GEPs to understand type structures
being decomposed. While I *think* all of the semantic issues here are gone,
a number of people in the last couple of years have still insisted that
bitcasting pointers blocks optimizations so we should at least investigate
this prior to making a change.

But this led me to the last paragraph in my email -- if all of this goes
away with typeless pointers, it's not clear that it's worth pursuing a
change for #3. Not saying we definitely shouldn't, just that we should
weigh that against removing types from pointers entirely so that these
issues don't come at all, regardless of how we spell the instruction.

One reason why is that most things that want to reason about all-zero GEPs
probably want to reason about all-constant GEPs as well.

This is about the equivalence of gep 0,0 to bitcast. They aren't equivalent
to other types of geps, so reasoning about them seems very different and
orthogonal to me.
You are saying "well things *should* understand all constant GEP
anyway".
That may or may not be true, but it's, IMHO, pretty orthogonal to how we
choose to canonicalize a given operation.
You are also suggesting representing an operation in something
significantly more complicated than it's direct form.

Sorry it seemed orthogonal. All I meant was that handling all zero GEPs
might be unusually low cost because of handling all constant GEPs. That
only really speaks to #3 above, and it really only lessens the cost. As you
say, it can't eliminate it as a GEP is different from a bitcast and so we
may end up handling both.

Also, can you provide some data?
You say "most" and "probably" with no explicit examples.
Can you pull out an explicit example of how it would help existing passes,
both optimization, and analysis, vs canonicalizing to bitcast?

Well, my intent here was not to say it would *help* existing passes but
only that several existing passes already handled all constant GEPs and the
all zero case largely fell out as a consequence.

The example I'm most familiar with is SROA of course. In all but one place
it uses code to handle all constant GEPs and doesn't need to special case
all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to
have existing code handling constant GEPs that would handle zero GEPs for
free.

As for examples of where it would *help* in a fundamental way? No, I think
all of those are gone. It used to help SROA before the rewrite. It also
helped DependenceAnalysis before it was rewritten. The first was crippled
by relying on this but remaining conservatively correct. The second was
actually buggy because it relied on this without remaining conservatively
correct. LLVM has been moving away from this being a useful thing to
*fundamentally* rely on.

Examples of where it would help in a trivial way? Any pass that handles
all-constant-GEPs, but not bitcasts. We could easily teach such a pass to
handle bitcasts though. We could also teach the passes that handle bitcasts
to handle all-zero-GEPs. In fact, we've automated this for most passes with
stripPointerCasts and friends.

I don't see a fundamental advantage of one over the other, so we're left
with essentially an engineering tradeoff. If we weren't planning to remove
pointer types entirely, this would still be an important engineering
tradeoff, but I'm not sure which way it would go. Given that we're
planning
to remove pointer types entirely, I'd rather focus on that change rather
than changing canonicalization strategies, and patch passes to cope with
today's canonical form until we finish. But that is a fairly mild
"rather".
New information could quite easily change my mind. And it is just my two
cents of course.

Anyways, again, sorry if my previous email came off as a mandate, I just
meant to indicate that the issue was not clear cut to me, not that it was
some kind of definite thing one way or the other.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161224/5d890449/attachment-0001.html>

Piotr Padlewski via llvm-dev

2016-Dec-25 12:40 UTC

head link

[llvm-dev] [canonicalization] GEP 0, 0

2016-12-24 9:39 GMT+01:00 Chandler Carruth <chandlerc at google.com>:
> On Fri, Dec 23, 2016 at 10:17 PM Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
> On Fri, Dec 23, 2016 at 10:02 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
>
> On Fri, Dec 23, 2016 at 3:30 PM Daniel Berlin via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 2:31 PM, Piotr Padlewski via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
> 2016-12-23 22:17 GMT+01:00 David Majnemer <david.majnemer at
gmail.com>:
>
>
>
> On Fri, Dec 23, 2016 at 1:09 PM, Davide Italiano via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Fri, Dec 23, 2016 at 1:01 PM, Piotr Padlewski
> <piotr.padlewski at gmail.com> wrote:
> >
> >
> > On Dec 23, 2016 19:47, "Daniel Berlin" <dberlin at
dberlin.org> wrote:
> >
> > Define soon?
> > My guess is 1 year or less.
> > (I've already seen patches to start converting most remaining
memdep
> uses,
> > like memcpy opt, licm, etc)
> >
> >
> > That's good. Anyway I already have a patch that is doing invariant
group
> > dependence across BBs, so I guess it make sense to push it upstream to
> push
> > the bar higher.
> > But I think we are getting a little bit of topic - should gep 0 be
> > canonicalized to bitcast?
> >
>
> Are memdep/memssa the only possible passes that could benefit from
> such a canonicalization or you can think of other cases when this can
> be useful?
>
>
> We already canonicalize.  We canonicalize in the other direction:
> https://github.com/llvm-mirror/llvm/blob/master/
> lib/Transforms/InstCombine/InstCombineCasts.cpp#L2024
>
>
> Intresting. So what is the right solution here? I can easily add handling
> of gep 0 to GVN, or maybe the code that you mentioned should be in SROA.
> If SROA is the only user of this transformation and I there are quiet a
> few passes that it hurts, then I would propose moving this logic to SROA
> and always canonicalize gep 0 to bitcast.
>
>
> +1
>
> This canonicalization seems like it's unlikely to help other passes,
and
> definitely makes handling more complicated elsewhere.
>
>
> -1
> ;]
>
> I think it is very useful to be able to canonicalize on GEPs rather than
> bitcasts.
>
>
> Based on what?
>
>
> TBH, no hard data at all. Just my experience poking at passes that cared
> about this and the advice I received from others when hacking on them.
> Sorry if anything I wrote came off as some kind of firm, or unwavering
> position, it was more that I'm not sure the proposed change is good. It
> might be, but I'm not sure yet. See below for more details though.
>
>
>
>
>
> I would teach things to understand GEPs with zero indices, it's pretty
> easy (we have tools to automate it).
>
>
> So your position is that we should teach literally everything to
> understand something new, instead of canonicalize in the direction
> literally everything understands already?
>
>
> Not at all.
>
> All-zero-GEPs aren't new for better or worse. We've been
canonicalizing
> toward them since before Chris refactored all of this code in 2007, so this
> is a very long-standing pattern even if it is a bad one. And several parts
> of LLVM of course handle them. They have to given that we're
canonicalizing
> that direction.
>
> I learned of this years ago when Duncan Sands taught me about it to
> explain why my SROA rewrite hit so many all-zero GEPs. Since then, it
> hasn't seemed to me personally like a significant issue that we
> canonicalize towards all-zero GEPs. And I've not heard many folks hit
> issues with it since then. So my advice is typically "yep, you need to
> handle all-zero GEPs". That's essentially the default for when a
pass
> doesn't handle canonical form.
>
> It isn't an endorsement though, and it doesn't mean we can't or
shouldn't
> change the canonical form! If we have new evidence that makes a change
> warranted, we absolutely should. Some kinds of evidence I can remember in
> the past motivating a change:
> 1) It's really hard and/or awkward to teach things about the canonical
> form.
> 2) The canonical form doesn't express as much information or is in any
> other way a less useful/expressive representation.
> 3) Empirical evidence shows that a different canonical form despite being
> perfectly equivalent on the prior two points, is less convenient.
>
> The statement from Piotr that it is easy to teach GVN about this seems to
> indicate we're not talking about the first two issues, but if we are,
> that's a whole new discussion (and an interesting one).
>
> It seems possible that #3 is true (I suspect you think #3 is true from
> your email at least). While we should in general fix these kinds of issues
> when we find them, it does seem somewhat less urgent. And since there are
> conflicting experiences, we probably want at least a comprehensive survey
> before we make a change.
>
> In this particular case I suspect that we *used* to have issues related to
> #2 -- SROA before my rewrite relied on GEPs to understand type structures
> being decomposed. While I *think* all of the semantic issues here are gone,
> a number of people in the last couple of years have still insisted that
> bitcasting pointers blocks optimizations so we should at least investigate
> this prior to making a change.
>
> But this led me to the last paragraph in my email -- if all of this goes
> away with typeless pointers, it's not clear that it's worth
pursuing a
> change for #3. Not saying we definitely shouldn't, just that we should
> weigh that against removing types from pointers entirely so that these
> issues don't come at all, regardless of how we spell the instruction.
>
>
>
> One reason why is that most things that want to reason about all-zero GEPs
> probably want to reason about all-constant GEPs as well.
>
> This is about the equivalence of gep 0,0 to bitcast. They aren't
> equivalent to other types of geps, so reasoning about them seems very
> different and orthogonal to me.
> You are saying "well things *should* understand all constant GEP
anyway".
> That may or may not be true, but it's, IMHO, pretty orthogonal to how
we
> choose to canonicalize a given operation.
> You are also suggesting representing an operation in something
> significantly more complicated than it's direct form.
>
>
> Sorry it seemed orthogonal. All I meant was that handling all zero GEPs
> might be unusually low cost because of handling all constant GEPs. That
> only really speaks to #3 above, and it really only lessens the cost. As you
> say, it can't eliminate it as a GEP is different from a bitcast and so
we
> may end up handling both.
>
>
> Also, can you provide some data?
> You say "most" and "probably" with no explicit
examples.
> Can you pull out an explicit example of how it would help existing passes,
> both optimization, and analysis, vs canonicalizing to bitcast?
>
>
> Well, my intent here was not to say it would *help* existing passes but
> only that several existing passes already handled all constant GEPs and the
> all zero case largely fell out as a consequence.
>
> The example I'm most familiar with is SROA of course. In all but one
place
> it uses code to handle all constant GEPs and doesn't need to special
case
> all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to
> have existing code handling constant GEPs that would handle zero GEPs for
> free.
>
> As for examples of where it would *help* in a fundamental way? No, I think
> all of those are gone. It used to help SROA before the rewrite. It also
> helped DependenceAnalysis before it was rewritten. The first was crippled
> by relying on this but remaining conservatively correct. The second was
> actually buggy because it relied on this without remaining conservatively
> correct. LLVM has been moving away from this being a useful thing to
> *fundamentally* rely on.
>
> Examples of where it would help in a trivial way? Any pass that handles
> all-constant-GEPs, but not bitcasts. We could easily teach such a pass to
> handle bitcasts though. We could also teach the passes that handle bitcasts
> to handle all-zero-GEPs. In fact, we've automated this for most passes
with
> stripPointerCasts and friends.
>
>
> I don't see a fundamental advantage of one over the other, so we're
left
> with essentially an engineering tradeoff. If we weren't planning to
remove
> pointer types entirely, this would still be an important engineering
> tradeoff, but I'm not sure which way it would go. Given that we're
planning
> to remove pointer types entirely, I'd rather focus on that change
rather
> than changing canonicalization strategies, and patch passes to cope with
> today's canonical form until we finish. But that is a fairly mild
"rather".
> New information could quite easily change my mind. And it is just my two
> cents of course.
>
>
> Anyways, again, sorry if my previous email came off as a mandate, I just
> meant to indicate that the issue was not clear cut to me, not that it was
> some kind of definite thing one way or the other.
>
Ok I get your point.
- If every gep 0, 0 would be bitcast, then we would still need code to
handle all constant geps that would handle this case, but we also need code
to handle bitcasts that will be gone some time
- if we stay with gep 0, 0 then places like GVN that doesn't handle it
right now should probably handle all constant geps anyway, so it doesn't
matter in which form it is.

I didn't investigate how hard it would be to handle geps 0 or constant geps
in GVN, but at least for handling of invariant groups it required 2 lines
of code. I did look a little bit into SROA and I don't see a clear solution
to handle bitcasts for SROA yet, so it also might be argument against
bitcasts (but I also not familiar with SROA and didn't spend enough time on
it).
When do we expect to deploy typeless pointers?

@dannyb are you fine with me writing this small hotfix for invariant group?

Piotr
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161225/b7acb307/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Dec 2016 - [canonicalization] GEP 0, 0

[llvm-dev] [canonicalization] GEP 0, 0

[llvm-dev] [canonicalization] GEP 0, 0

[llvm-dev] [canonicalization] GEP 0, 0

Apparently Analagous Threads