thr3ads.net - llvm dev - [LLVMdev] RFC: Missing canonicalization in LLVM [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Pete Cooper

2015-Jan-21 22:43 UTC

[LLVMdev] RFC: Missing canonicalization in LLVM

> On Jan 21, 2015, at 2:18 PM, Chandler Carruth <chandlerc at
gmail.com> wrote:
> 
> On Wed, Jan 21, 2015 at 2:16 PM, Chandler Carruth <chandlerc at
gmail.com <mailto:chandlerc at gmail.com>> wrote:
> So, we've run into some test cases which are pretty alarming.
> 
> When inlining code in various different paths we can end up with this IR:
> 
> define void @f(float* %value, i8* %b) {
> entry:
>   %0 = load float* %value, align 4
>   %1 = bitcast i8* %b to float*
>   store float %0, float* %1, align 1
>   ret void
> }
> 
> define void @g(float* %value, i8* %b) {
> entry:
>   %0 = bitcast float* %value to i32*
>   %1 = load i32* %0, align 4
>   %2 = bitcast i8* %b to i32*
>   store i32 %1, i32* %2, align 1
>   ret void
> }
> 
> Now, I don't really care one way or the other about these two IR
inputs, but it's pretty concerning that we get these two equivalent bits of
code and nothing canonicalizes to one or the other.
> 
> So, the naive first blush approach here would be to canonicalize on the
first -- it has fewer instructions after all -- but I don't think that's
the right approach for two reasons:
> 
> 1) It will be a *very* narrow canonicalization that only works with overly
specific sets of casted pointers.
> 2) It doesn't effectively move us toward the optimizer treating IR with
different pointee types for pointer types indistinguishably. Some day, I
continue to think we should get rid of the pointee types entirely.
> 
> To see why #1 and #2 are problematic, assume another round of inlining took
place and we suddenly had the following IR:
> 
> 
> And the missing IR example:
> 
> define void @f(i8* %value, i8* %b) {
> entry:
>   %0 = bitcast i8* %value to float*
>   %1 = load float* %0, align 4
>   %2 = bitcast i8* %b to float*
>   store float %0, float* %1, align 1
>   ret void
> }
> 
> define void @g(i8* %value, i8* %b) {
> entry:
>   %0 = bitcast i8* %value to i32*
>   %1 = load i32* %0, align 4
>   %2 = bitcast i8* %b to i32*
>   store i32 %1, i32* %2, align 1
>   ret void
> }
>  
> 
> AFAICT, this is the same and we still don't have a good
canonicalization story.
> 
> What seems like the obvious important and missing canonicalization is that
when we have a loaded value that is *only* used by storing it back into memory,
we don't canonicalize the type of that *value* (ignoring the pointer types)
to a single value type.
> 
> So, the only really suitable type for this kind of stuff is 'iN'
where N matches the number of bits loaded or stored.
> 
> I have this change implemented. It is trivial and unsurprising. However,
the effects of this are impossible to predict so I wanted to make sure it made
sense to others. Essentially, I expect random and hard to track down performance
fluctuations across the board. Some things may get better, others may get worse,
and they will probably all be bugs elsewhere in the stack.
> 
> So, thoughts?The first thing that springs to mind is that I don’t trust the backend to get
this right.  I don’t think it will understand when an i32 load/store would have
been preferable to a float one or vice versa.  I have no evidence of this, but
given how strongly typed tablegen is, I don’t think it can make a good choice
here.

So I think we probably need to teach the backend how to undo whatever canonical
form we choose if it has a reason to.  And the best long term solution is for
tablegen to have sized load/stores, not typed ones.

One (potentially expensive) way to choose the canonical form here is to look at
the users of the load and see what type works best.  If we load an i32, but bit
cast and do an fp operation on it, then a float load was best.  If we just load
it then store, then in theory either type works.

Pete> -Chandler
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/d1dc877a/attachment.html>

Chandler Carruth

2015-Jan-21 23:02 UTC

head link

[LLVMdev] RFC: Missing canonicalization in LLVM

On Wed, Jan 21, 2015 at 2:43 PM, Pete Cooper <peter_cooper at apple.com>
wrote:
> The first thing that springs to mind is that I don’t trust the backend to
> get this right.  I don’t think it will understand when an i32 load/store
> would have been preferable to a float one or vice versa.  I have no
> evidence of this, but given how strongly typed tablegen is, I don’t think
> it can make a good choice here.
>
I don't think tablegen is relevant to making a good choice here. This only
comes up when we have a load which is only ever stored. See below, I'll
come back to this after clarifying...

>
> So I think we probably need to teach the backend how to undo whatever
> canonical form we choose if it has a reason to.  And the best long term
> solution is for tablegen to have sized load/stores, not typed ones.
>
> One (potentially expensive) way to choose the canonical form here is to
> look at the users of the load and see what type works best.  If we load an
> i32, but bit cast and do an fp operation on it, then a float load was
> best.  If we just load it then store, then in theory either type works.
>
We actually already do this. =] I tought instcombine to do this recently.
The way this works is as follows:

If we find a load with a single bitcast of its value to some other type, we
try to load that type instead. We rely on the fact that if there is in fact
a single type that the load is used as, some other part of LLVM will fold
the redundant bitcasts. I can easily handle redundant bitcasts if you like.

If we find a store of a bitcasted value, we try to store the original value
instead.

This works really well *except* for the case when the only (transitive)
users of the loaded value are themselves stores. In that case, there is no
"truth" we can rely on from operational semantics. We need to just
pick a
consistent answer IMO. Integers seem like the right consistent answer, and
this isn't the only place LLVM does this -- we also lower a small memcpy as
an integer load and store.

As for the backend, I agree I don't trust them to do anything clever here,
but I think you may be missing how clever they would need to be. =D The
only way this matters is that, for example, you have a store-to-load
forwarding unit in your CPU that only works within a register class, and
thus a stored integer will fail to get forwarded in the CPU to a load of a
floating point value at the same address. If LLVM is ever able to forward
this in the IR, it should re-write all the types to match whatever
operation we have.

I don't think any backend today (or in the foreseeable future) would have
such smarts. But if CPUs are actually impacted by this kind of thing (and I
have no evidence that they are), we might be getting lucky some of the time.

Personally, I don't think this is compelling enough to delay making a
change because i have test cases where we are getting *unlucky* today, and
picking a canonical form will either directly fix them or make it much
easier to fix them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/b61b7282/attachment.html>

Hal Finkel

2015-Jan-21 23:02 UTC

head link

[LLVMdev] RFC: Missing canonicalization in LLVM

----- Original Message -----> From: "Pete Cooper" <peter_cooper at apple.com>
> To: "Chandler Carruth" <chandlerc at gmail.com>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, January 21, 2015 4:43:47 PM
> Subject: Re: [LLVMdev] RFC: Missing canonicalization in LLVM
> 
> 
> On Jan 21, 2015, at 2:18 PM, Chandler Carruth < chandlerc at gmail.com
>
> wrote:
> 
> 
> 
> 
> On Wed, Jan 21, 2015 at 2:16 PM, Chandler Carruth <
> chandlerc at gmail.com > wrote:
> 
> 
> 
> So, we've run into some test cases which are pretty alarming.
> 
> 
> When inlining code in various different paths we can end up with this
> IR:
> 
> 
> 
> define void @f(float* %value, i8* %b) {
> 
> entry:
> %0 = load float* %value, align 4
> %1 = bitcast i8* %b to float*
> store float %0, float* %1, align 1
> ret void
> }
> 
> 
> define void @g(float* %value, i8* %b) {
> 
> entry:
> %0 = bitcast float* %value to i32*
> %1 = load i32* %0, align 4
> %2 = bitcast i8* %b to i32*
> store i32 %1, i32* %2, align 1
> ret void
> }
> 
> 
> Now, I don't really care one way or the other about these two IR
> inputs, but it's pretty concerning that we get these two equivalent
> bits of code and nothing canonicalizes to one or the other.
> 
> 
> So, the naive first blush approach here would be to canonicalize on
> the first -- it has fewer instructions after all -- but I don't
> think that's the right approach for two reasons:
> 
> 
> 1) It will be a *very* narrow canonicalization that only works with
> overly specific sets of casted pointers.
> 2) It doesn't effectively move us toward the optimizer treating IR
> with different pointee types for pointer types indistinguishably.
> Some day, I continue to think we should get rid of the pointee types
> entirely.
> 
> 
> To see why #1 and #2 are problematic, assume another round of
> inlining took place and we suddenly had the following IR:
> 
> 
> 
> 
> And the missing IR example:
> 
> 
> define void @f(i8* %value, i8* %b) {
> 
> entry:
> %0 = bitcast i8* %value to float*
> %1 = load float* %0, align 4
> %2 = bitcast i8* %b to float*
> store float %0, float* %1, align 1
> ret void
> }
> 
> 
> define void @g(i8* %value, i8* %b) {
> 
> entry:
> %0 = bitcast i8* %value to i32*
> %1 = load i32* %0, align 4
> %2 = bitcast i8* %b to i32*
> store i32 %1, i32* %2, align 1
> ret void
> }
> 
> 
> 
> 
> 
> 
> 
> AFAICT, this is the same and we still don't have a good
> canonicalization story.
> 
> 
> What seems like the obvious important and missing canonicalization is
> that when we have a loaded value that is *only* used by storing it
> back into memory, we don't canonicalize the type of that *value*
> (ignoring the pointer types) to a single value type.
> 
> 
> So, the only really suitable type for this kind of stuff is 'iN'
> where N matches the number of bits loaded or stored.
> 
> 
> I have this change implemented. It is trivial and unsurprising.
> However, the effects of this are impossible to predict so I wanted
> to make sure it made sense to others. Essentially, I expect random
> and hard to track down performance fluctuations across the board.
> Some things may get better, others may get worse, and they will
> probably all be bugs elsewhere in the stack.
> 
> 
> So, thoughts? The first thing that springs to mind is that I don’t
> trust the backend to get this right. I don’t think it will
> understand when an i32 load/store would have been preferable to a
> float one or vice versa. I have no evidence of this, but given how
> strongly typed tablegen is, I don’t think it can make a good choice
> here.
> 
> 
> So I think we probably need to teach the backend how to undo whatever
> canonical form we choose if it has a reason to. And the best long
> term solution is for tablegen to have sized load/stores, not typed
> ones.
You still need to make sure, in many cases, that you load into the correct
register file to avoid expensive cross-domain moves. So *something* need to look
at the eventual users, either the IR-level optimizer or the backend. I'd
prefer the IR-level optimizer if possible.

 -Hal
> 
> 
> One (potentially expensive) way to choose the canonical form here is
> to look at the users of the load and see what type works best. If we
> load an i32, but bit cast and do an fp operation on it, then a float
> load was best. If we just load it then store, then in theory either
> type works.
> 
> 
> Pete
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -Chandler
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Pete Cooper

2015-Jan-21 23:06 UTC

head link

[LLVMdev] RFC: Missing canonicalization in LLVM

> On Jan 21, 2015, at 3:02 PM, Chandler Carruth <chandlerc at
gmail.com> wrote:
> 
> 
> On Wed, Jan 21, 2015 at 2:43 PM, Pete Cooper <peter_cooper at apple.com
<mailto:peter_cooper at apple.com>> wrote:
> The first thing that springs to mind is that I don’t trust the backend to
get this right.  I don’t think it will understand when an i32 load/store would
have been preferable to a float one or vice versa.  I have no evidence of this,
but given how strongly typed tablegen is, I don’t think it can make a good
choice here.
> 
> I don't think tablegen is relevant to making a good choice here. This
only comes up when we have a load which is only ever stored. See below, I'll
come back to this after clarifying...
>  
> 
> So I think we probably need to teach the backend how to undo whatever
canonical form we choose if it has a reason to.  And the best long term solution
is for tablegen to have sized load/stores, not typed ones.
> 
> One (potentially expensive) way to choose the canonical form here is to
look at the users of the load and see what type works best.  If we load an i32,
but bit cast and do an fp operation on it, then a float load was best.  If we
just load it then store, then in theory either type works.
> 
> We actually already do this. =] I tought instcombine to do this recently.
The way this works is as follows:
> 
> If we find a load with a single bitcast of its value to some other type, we
try to load that type instead. We rely on the fact that if there is in fact a
single type that the load is used as, some other part of LLVM will fold the
redundant bitcasts. I can easily handle redundant bitcasts if you like.
> 
> If we find a store of a bitcasted value, we try to store the original value
instead.
> 
> This works really well *except* for the case when the only (transitive)
users of the loaded value are themselves stores. In that case, there is no
"truth" we can rely on from operational semantics. We need to just
pick a consistent answer IMO. Integers seem like the right consistent answer,
and this isn't the only place LLVM does this -- we also lower a small memcpy
as an integer load and store.Yeah, thinking about this more, integers do seem like the right answer.  If a
backend wanted to do something special then its up to them to handle it.  For
example, x86 might load balance issue ports by turning an i32 load/store in to
SSE insert/extract, although i cannot imagine any time this would actually be a
good idea.> 
> As for the backend, I agree I don't trust them to do anything clever
here, but I think you may be missing how clever they would need to be. =D The
only way this matters is that, for example, you have a store-to-load forwarding
unit in your CPU that only works within a register class, and thus a stored
integer will fail to get forwarded in the CPU to a load of a floating point
value at the same address. If LLVM is ever able to forward this in the IR, it
should re-write all the types to match whatever operation we have.
> 
> I don't think any backend today (or in the foreseeable future) would
have such smarts. But if CPUs are actually impacted by this kind of thing (and I
have no evidence that they are), we might be getting lucky some of the time.
> 
> Personally, I don't think this is compelling enough to delay making a
change because i have test cases where we are getting *unlucky* today, and
picking a canonical form will either directly fix them or make it much easier to
fix them.Sounds good to me.  Integers it is then.

Pete

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/0caf5044/attachment.html>

Chandler Carruth

2015-Jan-21 23:14 UTC

head link

[LLVMdev] RFC: Missing canonicalization in LLVM

FYI, I'm going to add this behind a flag as the implementation really is
trivial and this should be it quite a bit easier to discuss.

On Wed, Jan 21, 2015 at 3:02 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
>
> On Wed, Jan 21, 2015 at 2:43 PM, Pete Cooper <peter_cooper at
apple.com>
> wrote:
>
>> The first thing that springs to mind is that I don’t trust the backend
to
>> get this right.  I don’t think it will understand when an i32
load/store
>> would have been preferable to a float one or vice versa.  I have no
>> evidence of this, but given how strongly typed tablegen is, I don’t
think
>> it can make a good choice here.
>>
>
> I don't think tablegen is relevant to making a good choice here. This
only
> comes up when we have a load which is only ever stored. See below, I'll
> come back to this after clarifying...
>
>
>>
>> So I think we probably need to teach the backend how to undo whatever
>> canonical form we choose if it has a reason to.  And the best long term
>> solution is for tablegen to have sized load/stores, not typed ones.
>>
>> One (potentially expensive) way to choose the canonical form here is to
>> look at the users of the load and see what type works best.  If we load
an
>> i32, but bit cast and do an fp operation on it, then a float load was
>> best.  If we just load it then store, then in theory either type works.
>>
>
> We actually already do this. =] I tought instcombine to do this recently.
> The way this works is as follows:
>
> If we find a load with a single bitcast of its value to some other type,
> we try to load that type instead. We rely on the fact that if there is in
> fact a single type that the load is used as, some other part of LLVM will
> fold the redundant bitcasts. I can easily handle redundant bitcasts if you
> like.
>
> If we find a store of a bitcasted value, we try to store the original
> value instead.
>
> This works really well *except* for the case when the only (transitive)
> users of the loaded value are themselves stores. In that case, there is no
> "truth" we can rely on from operational semantics. We need to
just pick a
> consistent answer IMO. Integers seem like the right consistent answer, and
> this isn't the only place LLVM does this -- we also lower a small
memcpy as
> an integer load and store.
>
> As for the backend, I agree I don't trust them to do anything clever
here,
> but I think you may be missing how clever they would need to be. =D The
> only way this matters is that, for example, you have a store-to-load
> forwarding unit in your CPU that only works within a register class, and
> thus a stored integer will fail to get forwarded in the CPU to a load of a
> floating point value at the same address. If LLVM is ever able to forward
> this in the IR, it should re-write all the types to match whatever
> operation we have.
>
> I don't think any backend today (or in the foreseeable future) would
have
> such smarts. But if CPUs are actually impacted by this kind of thing (and I
> have no evidence that they are), we might be getting lucky some of the
time.
>
> Personally, I don't think this is compelling enough to delay making a
> change because i have test cases where we are getting *unlucky* today, and
> picking a canonical form will either directly fix them or make it much
> easier to fix them.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/2467b45f/attachment.html>

Chandler Carruth

2015-Jan-21 23:34 UTC

head link

[LLVMdev] RFC: Missing canonicalization in LLVM

On Wed, Jan 21, 2015 at 3:02 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> You still need to make sure, in many cases, that you load into the correct
> register file to avoid expensive cross-domain moves. So *something* need to
> look at the eventual users, either the IR-level optimizer or the backend.
> I'd prefer the IR-level optimizer if possible.

See my response to Pete, but in general this only applies in a case where
there is no use of the load other than a store. I already taught
instcombine to canonicalize loads which are *used* as floating point values
correctly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150121/1f9f9564/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jan 2015 - [LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

[LLVMdev] RFC: Missing canonicalization in LLVM

Possibly Parallel Threads