thr3ads.net - llvm dev - [llvm-dev] RFC: A change in InstCombine canonical form [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Ehsan Amiri via llvm-dev

2016-Mar-16 13:41 UTC

[llvm-dev] RFC: A change in InstCombine canonical form

=== PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445)

IR contains code for loading a float from float * and storing it to a float
* address. After canonicalization of load in InstCombine [1], new bitcasts
are added to the IR (see bottom of the email for code samples). This
prevents select speculation in SROA to work. Also after SROA we have
bitcasts from int32 to float. (Whereas originally after instCombine,
bitcasts are only done on pointer types).

=== PROPOSED SOLUTION==
[1] implies that we need load canonicalization when we load a value only to
store it again. The reason is to avoid generating slightly different code
(due to different ways of adding bitcasts), in different situations. In all
examples presented in [1] there is a non-zero number of bitcasts. I think
when we load a value of type T from a T* address and store it as a type T
value to one or more T* address (and there is no other use or store), we
can redefine canonical form to mean there should not be any bitcasts. So we
still have a canonical form, but its definition is slightly different.

=== REASONS FOR / AGAINST==
Hal Finkel warns that while this may be useful for power pc, this may hurt
more than one other platform and become a very large project. Despite this
he is fine with bringing up the issue to the mailing list to get feedback,
mostly because this seems inline with our future direction of having a
unique type for all pointers.  (Hal please correct me if I misunderstood
your comment)

This is a much simpler fix compared to alternatives. (ignoring potential
regressions)

=== ALTERNATIVE SOLUTION ==
Fix select speculation in SROA to see through bitcasts. Handle remaining
bitcasts during code gen. Other alternative solutions are welcome.

Should I implement the proposed solution or is it too risky? I understand
that we may need to undo it if it breaks too many things. Comments are
welcome.


[1] http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
r226781  git commit id: b778cbc0c8



Code Samples (only relevant part is copied):

--------------------  Before Canonicalization (contains call to std::max):
--------------------
entry:
  %max_value = alloca float, align 4
  %1 = load float, float* %input, align 4, !tbaa !1
  store float %1, float* %max_value, align 4, !tbaa !1

for.body:
  %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
  %3 = load float, float* %call, align 4, !tbaa !1
  store float %3, float* %max_value, align 4, !tbaa !1

--------------------  After Canonicalization (contains call to
std::max):--------------------

entry:
  %max_value = alloca float, align 4
  %1 = bitcast float* %input to i32*
  %2 = load i32, i32* %1, align 4, !tbaa !1
  %3 = bitcast float* %max_value to i32*
  store i32 %2, i32* %3, align 4, !tbaa !1

for.body:
  %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
nonnull dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
  %5 = bitcast float* %call to i32*
  %6 = load i32, i32* %5, align 4, !tbaa !1
  %7 = bitcast float* %max_value to i32*
  store i32 %6, i32* %7, align 4, !tbaa !1

-------------------- After SROA (the call to std::max is inlined
now):--------------------
entry:
  %max_value.sroa.0 = alloca i32
  %0 = bitcast float* %input to i32*
  %1 = load i32, i32* %0, align 4, !tbaa !1
  store i32 %1, i32* %max_value.sroa.0

for.body:
  %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
%max_value.sroa.0
  %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
  %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32* %max_value.sroa.0
to float*
  %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
%max_value.sroa.0.0.max_value.sroa_cast8
  %5 = bitcast float* %__b.__a.i to i32*
  %6 = load i32, i32* %5, align 4, !tbaa !1
  store i32 %6, i32* %max_value.sroa.0

-------------------- After SROA when Canonicalization is turned
off--------------------
entry:
  %0 = load float, float* %input, align 4, !tbaa !1

for.cond:                                         ; preds = %for.body,
%entry
  %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ]

for.body:
  %1 = load float, float* %arrayidx1, align 4, !tbaa !1
  %cmp.i = fcmp olt float %max_value.0, %1
  %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
!tbaa !1
  %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true,
float %max_value.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/6ebf678e/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-16 15:34 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

Hi,

How do it interact with the "typeless pointers" work?

Thanks,

-- 
Mehdi
> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> === PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445
<https://llvm.org/bugs/show_bug.cgi?id=26445>)
> 
> IR contains code for loading a float from float * and storing it to a float
* address. After canonicalization of load in InstCombine [1], new bitcasts are
added to the IR (see bottom of the email for code samples). This prevents select
speculation in SROA to work. Also after SROA we have bitcasts from int32 to
float. (Whereas originally after instCombine, bitcasts are only done on pointer
types).
> 
> === PROPOSED SOLUTION==> 
> [1] implies that we need load canonicalization when we load a value only to
store it again. The reason is to avoid generating slightly different code (due
to different ways of adding bitcasts), in different situations. In all examples
presented in [1] there is a non-zero number of bitcasts. I think when we load a
value of type T from a T* address and store it as a type T value to one or more
T* address (and there is no other use or store), we can redefine canonical form
to mean there should not be any bitcasts. So we still have a canonical form, but
its definition is slightly different.
> 
> === REASONS FOR / AGAINST==> 
> Hal Finkel warns that while this may be useful for power pc, this may hurt
more than one other platform and become a very large project. Despite this he is
fine with bringing up the issue to the mailing list to get feedback, mostly
because this seems inline with our future direction of having a unique type for
all pointers.  (Hal please correct me if I misunderstood your comment)
> 
> This is a much simpler fix compared to alternatives. (ignoring potential
regressions)
> 
> === ALTERNATIVE SOLUTION ==> 
> Fix select speculation in SROA to see through bitcasts. Handle remaining
bitcasts during code gen. Other alternative solutions are welcome.
> 
> Should I implement the proposed solution or is it too risky? I understand
that we may need to undo it if it breaks too many things. Comments are welcome.
> 
> 
> [1] http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html> 
r226781  git commit id: b778cbc0c8
> 
> 
> 
> Code Samples (only relevant part is copied):
> 
> --------------------  Before Canonicalization (contains call to std::max):
--------------------
> entry:
>   %max_value = alloca float, align 4
>   %1 = load float, float* %input, align 4, !tbaa !1
>   store float %1, float* %max_value, align 4, !tbaa !1
> 
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %3 = load float, float* %call, align 4, !tbaa !1
>   store float %3, float* %max_value, align 4, !tbaa !1
> 
> --------------------  After Canonicalization (contains call to
std::max):--------------------
> 
> entry:
>   %max_value = alloca float, align 4
>   %1 = bitcast float* %input to i32*
>   %2 = load i32, i32* %1, align 4, !tbaa !1
>   %3 = bitcast float* %max_value to i32*
>   store i32 %2, i32* %3, align 4, !tbaa !1
> 
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
nonnull dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %5 = bitcast float* %call to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   %7 = bitcast float* %max_value to i32*
>   store i32 %6, i32* %7, align 4, !tbaa !1
> 
> -------------------- After SROA (the call to std::max is inlined
now):--------------------
> entry:
>   %max_value.sroa.0 = alloca i32
>   %0 = bitcast float* %input to i32*
>   %1 = load i32, i32* %0, align 4, !tbaa !1
>   store i32 %1, i32* %max_value.sroa.0
> 
> for.body: 
>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
%max_value.sroa.0
>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32* %max_value.sroa.0
to float*
>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
%max_value.sroa.0.0.max_value.sroa_cast8
>   %5 = bitcast float* %__b.__a.i to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   store i32 %6, i32* %max_value.sroa.0
> 
> -------------------- After SROA when Canonicalization is turned
off--------------------
> entry:
>   %0 = load float, float* %input, align 4, !tbaa !1
> 
> for.cond:                                         ; preds = %for.body,
%entry
>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ]
> 
> for.body:
>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>   %cmp.i = fcmp olt float %max_value.0, %1
>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
!tbaa !1
>   %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true,
float %max_value.0
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/4389d467/attachment.html>

Ehsan Amiri via llvm-dev

2016-Mar-16 16:35 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

Hal knows better. My understanding is that, there is a similarity in the
code pattern generated, in that there will be no intervening bitcasts
between load and store.

Having said that, I just double checked one of the test cases that was
committed with canonicalization work.  My proposed solution may result in
lowering of memcpy to non-integer load and store. (See test1 in
test/Transforms/InstCombine/struct-assign-tbaa.ll).  This might be a
blocker. (Even if we can fix it, the proposed solution is now more
complicated than what I thought).



On Wed, Mar 16, 2016 at 11:34 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> Hi,
>
> How do it interact with the "typeless pointers" work?
>
> Thanks,
>
> --
> Mehdi
>
> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> === PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445)
>
> IR contains code for loading a float from float * and storing it to a
> float * address. After canonicalization of load in InstCombine [1], new
> bitcasts are added to the IR (see bottom of the email for code samples).
> This prevents select speculation in SROA to work. Also after SROA we have
> bitcasts from int32 to float. (Whereas originally after instCombine,
> bitcasts are only done on pointer types).
>
> === PROPOSED SOLUTION==>
> [1] implies that we need load canonicalization when we load a value only
> to store it again. The reason is to avoid generating slightly different
> code (due to different ways of adding bitcasts), in different situations.
> In all examples presented in [1] there is a non-zero number of bitcasts. I
> think when we load a value of type T from a T* address and store it as a
> type T value to one or more T* address (and there is no other use or
> store), we can redefine canonical form to mean there should not be any
> bitcasts. So we still have a canonical form, but its definition is slightly
> different.
>
> === REASONS FOR / AGAINST==>
> Hal Finkel warns that while this may be useful for power pc, this may hurt
> more than one other platform and become a very large project. Despite this
> he is fine with bringing up the issue to the mailing list to get feedback,
> mostly because this seems inline with our future direction of having a
> unique type for all pointers.  (Hal please correct me if I misunderstood
> your comment)
>
> This is a much simpler fix compared to alternatives. (ignoring potential
> regressions)
>
> === ALTERNATIVE SOLUTION ==>
> Fix select speculation in SROA to see through bitcasts. Handle remaining
> bitcasts during code gen. Other alternative solutions are welcome.
>
> Should I implement the proposed solution or is it too risky? I understand
> that we may need to undo it if it breaks too many things. Comments are
> welcome.
>
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
> r226781  git commit id: b778cbc0c8
>
>
>
> Code Samples (only relevant part is copied):
>
> --------------------  Before Canonicalization (contains call to std::max):
> --------------------
> entry:
>   %max_value = alloca float, align 4
>   %1 = load float, float* %input, align 4, !tbaa !1
>   store float %1, float* %max_value, align 4, !tbaa !1
>
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
> dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %3 = load float, float* %call, align 4, !tbaa !1
>   store float %3, float* %max_value, align 4, !tbaa !1
>
> --------------------  After Canonicalization (contains call to
> std::max):--------------------
>
> entry:
>   %max_value = alloca float, align 4
>   %1 = bitcast float* %input to i32*
>   %2 = load i32, i32* %1, align 4, !tbaa !1
>   %3 = bitcast float* %max_value to i32*
>   store i32 %2, i32* %3, align 4, !tbaa !1
>
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
> nonnull dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>   %5 = bitcast float* %call to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   %7 = bitcast float* %max_value to i32*
>   store i32 %6, i32* %7, align 4, !tbaa !1
>
> -------------------- After SROA (the call to std::max is inlined
> now):--------------------
> entry:
>   %max_value.sroa.0 = alloca i32
>   %0 = bitcast float* %input to i32*
>   %1 = load i32, i32* %0, align 4, !tbaa !1
>   store i32 %1, i32* %max_value.sroa.0
>
> for.body:
>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
> %max_value.sroa.0
>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
> %max_value.sroa.0 to float*
>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
> %max_value.sroa.0.0.max_value.sroa_cast8
>   %5 = bitcast float* %__b.__a.i to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   store i32 %6, i32* %max_value.sroa.0
>
> -------------------- After SROA when Canonicalization is turned
> off--------------------
> entry:
>   %0 = load float, float* %input, align 4, !tbaa !1
>
> for.cond:                                         ; preds = %for.body,
> %entry
>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ]
>
> for.body:
>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>   %cmp.i = fcmp olt float %max_value.0, %1
>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
> !tbaa !1
>   %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true,
> float %max_value.0
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/e8f55458/attachment.html>

David Blaikie via llvm-dev

2016-Mar-16 17:12 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> How do it interact with the "typeless pointers" work?
>
Right - the goal of the typeless pointer work is to fix all these bugs
related to "didn't look through bitcasts" in optimizations.
Sometimes
that's going to mean more work (because the code is leaning on the absence
of bitcasts & the presence of convenient (but not guaranteed) type
information to inform optimization decisions) but if we remove typed
pointers while keeping optimization quality in the cases we have today,
then we should've also fixed the cases that were broken because the type
information didn't end up aligning to produce the optimal output.

& I know I've been off the typeless pointer stuff for a few months
working
on llvm-dwp - but happy for any help (the next immediate piece is probably
figuring out teh right representation for byval and inalloca - there were
some code reviews sent out for that that I'll need to come back around to -
but also any optimizations people want to help rework/improve would be
great too & I can provide some techniques/tools to help people approach
those)

- Dave

>
> Thanks,
>
> --
> Mehdi
>
> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> === PROBLEM === (See this bug https://llvm.org/bugs/show_bug.cgi?id=26445)
>
> IR contains code for loading a float from float * and storing it to a
> float * address. After canonicalization of load in InstCombine [1], new
> bitcasts are added to the IR (see bottom of the email for code samples).
> This prevents select speculation in SROA to work. Also after SROA we have
> bitcasts from int32 to float. (Whereas originally after instCombine,
> bitcasts are only done on pointer types).
>
> === PROPOSED SOLUTION==>
> [1] implies that we need load canonicalization when we load a value only
> to store it again. The reason is to avoid generating slightly different
> code (due to different ways of adding bitcasts), in different situations.
> In all examples presented in [1] there is a non-zero number of bitcasts. I
> think when we load a value of type T from a T* address and store it as a
> type T value to one or more T* address (and there is no other use or
> store), we can redefine canonical form to mean there should not be any
> bitcasts. So we still have a canonical form, but its definition is slightly
> different.
>
> === REASONS FOR / AGAINST==>
> Hal Finkel warns that while this may be useful for power pc, this may hurt
> more than one other platform and become a very large project. Despite this
> he is fine with bringing up the issue to the mailing list to get feedback,
> mostly because this seems inline with our future direction of having a
> unique type for all pointers.  (Hal please correct me if I misunderstood
> your comment)
>
> This is a much simpler fix compared to alternatives. (ignoring potential
> regressions)
>
> === ALTERNATIVE SOLUTION ==>
> Fix select speculation in SROA to see through bitcasts. Handle remaining
> bitcasts during code gen. Other alternative solutions are welcome.
>
> Should I implement the proposed solution or is it too risky? I understand
> that we may need to undo it if it breaks too many things. Comments are
> welcome.
>
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
> r226781  git commit id: b778cbc0c8
>
>
>
> Code Samples (only relevant part is copied):
>
> --------------------  Before Canonicalization (contains call to std::max):
> --------------------
> entry:
>   %max_value = alloca float, align 4
>   %1 = load float, float* %input, align 4, !tbaa !1
>   store float %1, float* %max_value, align 4, !tbaa !1
>
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
> dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %3 = load float, float* %call, align 4, !tbaa !1
>   store float %3, float* %max_value, align 4, !tbaa !1
>
> --------------------  After Canonicalization (contains call to
> std::max):--------------------
>
> entry:
>   %max_value = alloca float, align 4
>   %1 = bitcast float* %input to i32*
>   %2 = load i32, i32* %1, align 4, !tbaa !1
>   %3 = bitcast float* %max_value to i32*
>   store i32 %2, i32* %3, align 4, !tbaa !1
>
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
> nonnull dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>   %5 = bitcast float* %call to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   %7 = bitcast float* %max_value to i32*
>   store i32 %6, i32* %7, align 4, !tbaa !1
>
> -------------------- After SROA (the call to std::max is inlined
> now):--------------------
> entry:
>   %max_value.sroa.0 = alloca i32
>   %0 = bitcast float* %input to i32*
>   %1 = load i32, i32* %0, align 4, !tbaa !1
>   store i32 %1, i32* %max_value.sroa.0
>
> for.body:
>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
> %max_value.sroa.0
>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
> %max_value.sroa.0 to float*
>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
> %max_value.sroa.0.0.max_value.sroa_cast8
>   %5 = bitcast float* %__b.__a.i to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   store i32 %6, i32* %max_value.sroa.0
>
> -------------------- After SROA when Canonicalization is turned
> off--------------------
> entry:
>   %0 = load float, float* %input, align 4, !tbaa !1
>
> for.cond:                                         ; preds = %for.body,
> %entry
>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ]
>
> for.body:
>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>   %cmp.i = fcmp olt float %max_value.0, %1
>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
> !tbaa !1
>   %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true,
> float %max_value.0
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/6e3f6f55/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2016 - RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

Possibly Parallel Threads