thr3ads.net - llvm dev - [llvm-dev] RFC: A change in InstCombine canonical form [Mar 2016]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2016-Mar-16 18:13 UTC

[llvm-dev] RFC: A change in InstCombine canonical form

On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
> David,
>
> Could you give us an update on the status of typeless pointer work? How
> much work is left and when you think it might be ready?
>
It's a bit of an onion peel, really - since it will eventually involve
generalizing/fixing every optimization that's currently leaning on typed
pointers to keep the performance while removing the crutch they're
currently leaning on. (in cases where bitcasts are literally just getting
in the way, those won't require cleaning up & should just become
"free
performance wins" once we remove them, though)

At the moment we can roundtrip every LLVM IR test case through bitcode and
textual IR (reading/writing both formats) while using only a narrow
whitelist of places that request the type of a pointer (things like the
verifier, the parser/printer where it actually needs the typed pointer to
verify it matches the explicit type, etc).

The next thing on the list is probably figuring out the byval/inalloca
representation (add an explicit pointee type? just make the number of bytes
explicit with no type information?).

Then start migrating optimizations over - doing the same sort of testing I
did for the IR/bitcode roundtripping - assert that the pointee type is not
accessed, whitelist places that need it until the bitcasts go away, fix
anything else... it'll still be a fair bit of work & I don't really
know
how much. It should parallelize pretty well (doing any of this work is
really helpful, each optimization is indepednent, etc) if anyone wants
to/is able to help.

- Dave

>
> Thanks
> Ehsan
>
>
> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>>
>>
>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi,
>>>
>>> How do it interact with the "typeless pointers" work?
>>>
>>
>> Right - the goal of the typeless pointer work is to fix all these bugs
>> related to "didn't look through bitcasts" in
optimizations. Sometimes
>> that's going to mean more work (because the code is leaning on the
absence
>> of bitcasts & the presence of convenient (but not guaranteed) type
>> information to inform optimization decisions) but if we remove typed
>> pointers while keeping optimization quality in the cases we have today,
>> then we should've also fixed the cases that were broken because the
type
>> information didn't end up aligning to produce the optimal output.
>>
>> & I know I've been off the typeless pointer stuff for a few
months
>> working on llvm-dwp - but happy for any help (the next immediate piece
is
>> probably figuring out teh right representation for byval and inalloca -
>> there were some code reviews sent out for that that I'll need to
come back
>> around to - but also any optimizations people want to help
rework/improve
>> would be great too & I can provide some techniques/tools to help
people
>> approach those)
>>
>> - Dave
>>
>>
>>>
>>> Thanks,
>>>
>>> --
>>> Mehdi
>>>
>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> === PROBLEM === (See this bug
>>> https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>
>>> IR contains code for loading a float from float * and storing it to
a
>>> float * address. After canonicalization of load in InstCombine [1],
new
>>> bitcasts are added to the IR (see bottom of the email for code
samples).
>>> This prevents select speculation in SROA to work. Also after SROA
we have
>>> bitcasts from int32 to float. (Whereas originally after
instCombine,
>>> bitcasts are only done on pointer types).
>>>
>>> === PROPOSED SOLUTION==>>>
>>> [1] implies that we need load canonicalization when we load a value
only
>>> to store it again. The reason is to avoid generating slightly
different
>>> code (due to different ways of adding bitcasts), in different
situations.
>>> In all examples presented in [1] there is a non-zero number of
bitcasts. I
>>> think when we load a value of type T from a T* address and store it
as a
>>> type T value to one or more T* address (and there is no other use
or
>>> store), we can redefine canonical form to mean there should not be
any
>>> bitcasts. So we still have a canonical form, but its definition is
slightly
>>> different.
>>>
>>> === REASONS FOR / AGAINST==>>>
>>> Hal Finkel warns that while this may be useful for power pc, this
may
>>> hurt more than one other platform and become a very large project.
Despite
>>> this he is fine with bringing up the issue to the mailing list to
get
>>> feedback, mostly because this seems inline with our future
direction of
>>> having a unique type for all pointers.  (Hal please correct me if I
>>> misunderstood your comment)
>>>
>>> This is a much simpler fix compared to alternatives. (ignoring
potential
>>> regressions)
>>>
>>> === ALTERNATIVE SOLUTION ==>>>
>>> Fix select speculation in SROA to see through bitcasts. Handle
remaining
>>> bitcasts during code gen. Other alternative solutions are welcome.
>>>
>>> Should I implement the proposed solution or is it too risky? I
>>> understand that we may need to undo it if it breaks too many
things.
>>> Comments are welcome.
>>>
>>>
>>> [1]
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>> r226781  git commit id: b778cbc0c8
>>>
>>>
>>>
>>> Code Samples (only relevant part is copied):
>>>
>>> --------------------  Before Canonicalization (contains call to
>>> std::max): --------------------
>>> entry:
>>>   %max_value = alloca float, align 4
>>>   %1 = load float, float* %input, align 4, !tbaa !1
>>>   store float %1, float* %max_value, align 4, !tbaa !1
>>>
>>> for.body:
>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>> dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>>>   %3 = load float, float* %call, align 4, !tbaa !1
>>>   store float %3, float* %max_value, align 4, !tbaa !1
>>>
>>> --------------------  After Canonicalization (contains call to
>>> std::max):--------------------
>>>
>>> entry:
>>>   %max_value = alloca float, align 4
>>>   %1 = bitcast float* %input to i32*
>>>   %2 = load i32, i32* %1, align 4, !tbaa !1
>>>   %3 = bitcast float* %max_value to i32*
>>>   store i32 %2, i32* %3, align 4, !tbaa !1
>>>
>>> for.body:
>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>> nonnull dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>>>   %5 = bitcast float* %call to i32*
>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>   %7 = bitcast float* %max_value to i32*
>>>   store i32 %6, i32* %7, align 4, !tbaa !1
>>>
>>> -------------------- After SROA (the call to std::max is inlined
>>> now):--------------------
>>> entry:
>>>   %max_value.sroa.0 = alloca i32
>>>   %0 = bitcast float* %input to i32*
>>>   %1 = load i32, i32* %0, align 4, !tbaa !1
>>>   store i32 %1, i32* %max_value.sroa.0
>>>
>>> for.body:
>>>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
>>> %max_value.sroa.0
>>>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to
float
>>>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
>>> %max_value.sroa.0 to float*
>>>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>   %5 = bitcast float* %__b.__a.i to i32*
>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>   store i32 %6, i32* %max_value.sroa.0
>>>
>>> -------------------- After SROA when Canonicalization is turned
>>> off--------------------
>>> entry:
>>>   %0 = load float, float* %input, align 4, !tbaa !1
>>>
>>> for.cond:                                         ; preds =
%for.body,
>>> %entry
>>>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated,
>>> %for.body ]
>>>
>>> for.body:
>>>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>>   %cmp.i = fcmp olt float %max_value.0, %1
>>>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align
4,
>>> !tbaa !1
>>>   %.sroa.speculated = select i1 %cmp.i, float
>>> %.sroa.speculate.load.true, float %max_value.0
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/adb51e99/attachment.html>

Ehsan Amiri via llvm-dev

2016-Mar-22 18:32 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

Back to the discussion on the RFC, I still see some advantage in following
the proposed solution. I see two paths forward:

1- Change canonical form, possibly lower memcpy to non-integer load and
store in InstCombine. Then teach the backends to convert that to integer
load and store if that is more profitable. Notice that we are talking about
loads that have no use other than store. So it is a fairly simple change in
the backends.

2- Do not change the canonical form. Then for this bug, we need to teach
select speculation to see through bitcasts. We will probably need to teach
other optimizations to see though bitcasts in the future as problems are
uncovered. That is until typeless pointer work is complete. Once the
typeless pointer work is complete, we have some extra code in each
optimization for seeing through bitcasts which is possibly no longer needed.

Based on this I think (1) is the right thing to do. But there might be
other reasons for the current canonical form that I am not aware of. Please
let me know what you think.

Thanks
Ehsan

On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at
gmail.com>
> wrote:
>
>> David,
>>
>> Could you give us an update on the status of typeless pointer work? How
>> much work is left and when you think it might be ready?
>>
>
> It's a bit of an onion peel, really - since it will eventually involve
> generalizing/fixing every optimization that's currently leaning on
typed
> pointers to keep the performance while removing the crutch they're
> currently leaning on. (in cases where bitcasts are literally just getting
> in the way, those won't require cleaning up & should just become
"free
> performance wins" once we remove them, though)
>
> At the moment we can roundtrip every LLVM IR test case through bitcode and
> textual IR (reading/writing both formats) while using only a narrow
> whitelist of places that request the type of a pointer (things like the
> verifier, the parser/printer where it actually needs the typed pointer to
> verify it matches the explicit type, etc).
>
> The next thing on the list is probably figuring out the byval/inalloca
> representation (add an explicit pointee type? just make the number of bytes
> explicit with no type information?).
>
> Then start migrating optimizations over - doing the same sort of testing I
> did for the IR/bitcode roundtripping - assert that the pointee type is not
> accessed, whitelist places that need it until the bitcasts go away, fix
> anything else... it'll still be a fair bit of work & I don't
really know
> how much. It should parallelize pretty well (doing any of this work is
> really helpful, each optimization is indepednent, etc) if anyone wants
> to/is able to help.
>
> - Dave
>
>
>>
>> Thanks
>> Ehsan
>>
>>
>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> How do it interact with the "typeless pointers" work?
>>>>
>>>
>>> Right - the goal of the typeless pointer work is to fix all these
bugs
>>> related to "didn't look through bitcasts" in
optimizations. Sometimes
>>> that's going to mean more work (because the code is leaning on
the absence
>>> of bitcasts & the presence of convenient (but not guaranteed)
type
>>> information to inform optimization decisions) but if we remove
typed
>>> pointers while keeping optimization quality in the cases we have
today,
>>> then we should've also fixed the cases that were broken because
the type
>>> information didn't end up aligning to produce the optimal
output.
>>>
>>> & I know I've been off the typeless pointer stuff for a few
months
>>> working on llvm-dwp - but happy for any help (the next immediate
piece is
>>> probably figuring out teh right representation for byval and
inalloca -
>>> there were some code reviews sent out for that that I'll need
to come back
>>> around to - but also any optimizations people want to help
rework/improve
>>> would be great too & I can provide some techniques/tools to
help people
>>> approach those)
>>>
>>> - Dave
>>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> === PROBLEM === (See this bug
>>>> https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>
>>>> IR contains code for loading a float from float * and storing
it to a
>>>> float * address. After canonicalization of load in InstCombine
[1], new
>>>> bitcasts are added to the IR (see bottom of the email for code
samples).
>>>> This prevents select speculation in SROA to work. Also after
SROA we have
>>>> bitcasts from int32 to float. (Whereas originally after
instCombine,
>>>> bitcasts are only done on pointer types).
>>>>
>>>> === PROPOSED SOLUTION==>>>>
>>>> [1] implies that we need load canonicalization when we load a
value
>>>> only to store it again. The reason is to avoid generating
slightly
>>>> different code (due to different ways of adding bitcasts), in
different
>>>> situations. In all examples presented in [1] there is a
non-zero number of
>>>> bitcasts. I think when we load a value of type T from a T*
address and
>>>> store it as a type T value to one or more T* address (and there
is no other
>>>> use or store), we can redefine canonical form to mean there
should not be
>>>> any bitcasts. So we still have a canonical form, but its
definition is
>>>> slightly different.
>>>>
>>>> === REASONS FOR / AGAINST==>>>>
>>>> Hal Finkel warns that while this may be useful for power pc,
this may
>>>> hurt more than one other platform and become a very large
project. Despite
>>>> this he is fine with bringing up the issue to the mailing list
to get
>>>> feedback, mostly because this seems inline with our future
direction of
>>>> having a unique type for all pointers.  (Hal please correct me
if I
>>>> misunderstood your comment)
>>>>
>>>> This is a much simpler fix compared to alternatives. (ignoring
>>>> potential regressions)
>>>>
>>>> === ALTERNATIVE SOLUTION ==>>>>
>>>> Fix select speculation in SROA to see through bitcasts. Handle
>>>> remaining bitcasts during code gen. Other alternative solutions
are welcome.
>>>>
>>>> Should I implement the proposed solution or is it too risky? I
>>>> understand that we may need to undo it if it breaks too many
things.
>>>> Comments are welcome.
>>>>
>>>>
>>>> [1]
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>> r226781  git commit id: b778cbc0c8
>>>>
>>>>
>>>>
>>>> Code Samples (only relevant part is copied):
>>>>
>>>> --------------------  Before Canonicalization (contains call to
>>>> std::max): --------------------
>>>> entry:
>>>>   %max_value = alloca float, align 4
>>>>   %1 = load float, float* %input, align 4, !tbaa !1
>>>>   store float %1, float* %max_value, align 4, !tbaa !1
>>>>
>>>> for.body:
>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>>> dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>>>>   %3 = load float, float* %call, align 4, !tbaa !1
>>>>   store float %3, float* %max_value, align 4, !tbaa !1
>>>>
>>>> --------------------  After Canonicalization (contains call to
>>>> std::max):--------------------
>>>>
>>>> entry:
>>>>   %max_value = alloca float, align 4
>>>>   %1 = bitcast float* %input to i32*
>>>>   %2 = load i32, i32* %1, align 4, !tbaa !1
>>>>   %3 = bitcast float* %max_value to i32*
>>>>   store i32 %2, i32* %3, align 4, !tbaa !1
>>>>
>>>> for.body:
>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>>> nonnull dereferenceable(4) %max_value, float*
dereferenceable(4) %arrayidx1)
>>>>   %5 = bitcast float* %call to i32*
>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>   %7 = bitcast float* %max_value to i32*
>>>>   store i32 %6, i32* %7, align 4, !tbaa !1
>>>>
>>>> -------------------- After SROA (the call to std::max is
inlined
>>>> now):--------------------
>>>> entry:
>>>>   %max_value.sroa.0 = alloca i32
>>>>   %0 = bitcast float* %input to i32*
>>>>   %1 = load i32, i32* %0, align 4, !tbaa !1
>>>>   store i32 %1, i32* %max_value.sroa.0
>>>>
>>>> for.body:
>>>>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
>>>> %max_value.sroa.0
>>>>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to
float
>>>>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
>>>> %max_value.sroa.0 to float*
>>>>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
>>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>>   %5 = bitcast float* %__b.__a.i to i32*
>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>   store i32 %6, i32* %max_value.sroa.0
>>>>
>>>> -------------------- After SROA when Canonicalization is turned
>>>> off--------------------
>>>> entry:
>>>>   %0 = load float, float* %input, align 4, !tbaa !1
>>>>
>>>> for.cond:                                         ; preds =
%for.body,
>>>> %entry
>>>>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated,
>>>> %for.body ]
>>>>
>>>> for.body:
>>>>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>>>   %cmp.i = fcmp olt float %max_value.0, %1
>>>>   %.sroa.speculate.load.true = load float, float* %arrayidx1,
align 4,
>>>> !tbaa !1
>>>>   %.sroa.speculated = select i1 %cmp.i, float
>>>> %.sroa.speculate.load.true, float %max_value.0
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/19961422/attachment.html>

Philip Reames via llvm-dev

2016-Mar-22 19:03 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

I'm generally in support of (1), but you should definitely get active 
buy-in from Chandler before moving forward.  He's the most knowledgeable 
on the tradeoffs.

Also, are you willing to commit to the fairly large amount of work (1) 
implies?  I strongly suspect that doing (1) right will be *far* more 
work in the short term than (2).  It may still be the right long term 
answer, but are you ready to see the entire thing through?  Getting half 
way through and stopping would be much worse than (2).

Philip

On 03/22/2016 11:32 AM, Ehsan Amiri via llvm-dev wrote:> Back to the discussion on the RFC, I still see some advantage in 
> following the proposed solution. I see two paths forward:
>
> 1- Change canonical form, possibly lower memcpy to non-integer load 
> and store in InstCombine. Then teach the backends to convert that to 
> integer load and store if that is more profitable. Notice that we are 
> talking about loads that have no use other than store. So it is a 
> fairly simple change in the backends.
>
> 2- Do not change the canonical form. Then for this bug, we need to 
> teach select speculation to see through bitcasts. We will probably 
> need to teach other optimizations to see though bitcasts in the future 
> as problems are uncovered. That is until typeless pointer work is 
> complete. Once the typeless pointer work is complete, we have some 
> extra code in each optimization for seeing through bitcasts which is 
> possibly no longer needed.
>
> Based on this I think (1) is the right thing to do. But there might be 
> other reasons for the current canonical form that I am not aware of. 
> Please let me know what you think.
>
> Thanks
> Ehsan
>
> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie <dblaikie at gmail.com 
> <mailto:dblaikie at gmail.com>> wrote:
>
>
>
>     On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri
>     <ehsanamiri at gmail.com <mailto:ehsanamiri at gmail.com>>
wrote:
>
>         David,
>
>         Could you give us an update on the status of typeless pointer
>         work? How much work is left and when you think it might be ready?
>
>
>     It's a bit of an onion peel, really - since it will eventually
>     involve generalizing/fixing every optimization that's currently
>     leaning on typed pointers to keep the performance while removing
>     the crutch they're currently leaning on. (in cases where bitcasts
>     are literally just getting in the way, those won't require
>     cleaning up & should just become "free performance wins"
once we
>     remove them, though)
>
>     At the moment we can roundtrip every LLVM IR test case through
>     bitcode and textual IR (reading/writing both formats) while using
>     only a narrow whitelist of places that request the type of a
>     pointer (things like the verifier, the parser/printer where it
>     actually needs the typed pointer to verify it matches the explicit
>     type, etc).
>
>     The next thing on the list is probably figuring out the
>     byval/inalloca representation (add an explicit pointee type? just
>     make the number of bytes explicit with no type information?).
>
>     Then start migrating optimizations over - doing the same sort of
>     testing I did for the IR/bitcode roundtripping - assert that the
>     pointee type is not accessed, whitelist places that need it until
>     the bitcasts go away, fix anything else... it'll still be a fair
>     bit of work & I don't really know how much. It should
parallelize
>     pretty well (doing any of this work is really helpful, each
>     optimization is indepednent, etc) if anyone wants to/is able to help.
>
>     - Dave
>
>
>         Thanks
>         Ehsan
>
>
>         On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie
>         <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
wrote:
>
>
>
>             On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev
>             <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
>             wrote:
>
>                 Hi,
>
>                 How do it interact with the "typeless pointers"
work?
>
>
>             Right - the goal of the typeless pointer work is to fix
>             all these bugs related to "didn't look through
bitcasts"
>             in optimizations. Sometimes that's going to mean more work
>             (because the code is leaning on the absence of bitcasts &
>             the presence of convenient (but not guaranteed) type
>             information to inform optimization decisions) but if we
>             remove typed pointers while keeping optimization quality
>             in the cases we have today, then we should've also fixed
>             the cases that were broken because the type information
>             didn't end up aligning to produce the optimal output.
>
>             & I know I've been off the typeless pointer stuff for a
>             few months working on llvm-dwp - but happy for any help
>             (the next immediate piece is probably figuring out teh
>             right representation for byval and inalloca - there were
>             some code reviews sent out for that that I'll need to come
>             back around to - but also any optimizations people want to
>             help rework/improve would be great too & I can provide
>             some techniques/tools to help people approach those)
>
>             - Dave
>
>
>                 Thanks,
>
>                 -- 
>                 Mehdi
>
>>                 On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev
>>                 <llvm-dev at lists.llvm.org
>>                 <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>                 === PROBLEM === (See this bug
>>                 https://llvm.org/bugs/show_bug.cgi?id=26445)
>>
>>                 IR contains code for loading a float from float * and
>>                 storing it to a float * address. After
>>                 canonicalization of load in InstCombine [1], new
>>                 bitcasts are added to the IR (see bottom of the email
>>                 for code samples). This prevents select speculation
>>                 in SROA to work. Also after SROA we have bitcasts
>>                 from int32 to float. (Whereas originally after
>>                 instCombine, bitcasts are only done on pointer types).
>>
>>                 === PROPOSED SOLUTION==>>
>>                 [1] implies that we need load canonicalization when
>>                 we load a value only to store it again. The reason is
>>                 to avoid generating slightly different code (due to
>>                 different ways of adding bitcasts), in different
>>                 situations. In all examples presented in [1] there is
>>                 a non-zero number of bitcasts. I think when we load a
>>                 value of type T from a T* address and store it as a
>>                 type T value to one or more T* address (and there is
>>                 no other use or store), we can redefine canonical
>>                 form to mean there should not be any bitcasts. So we
>>                 still have a canonical form, but its definition is
>>                 slightly different.
>>
>>                 === REASONS FOR / AGAINST==>>
>>                 Hal Finkel warns that while this may be useful for
>>                 power pc, this may hurt more than one other platform
>>                 and become a very large project. Despite this he is
>>                 fine with bringing up the issue to the mailing list
>>                 to get feedback, mostly because this seems inline
>>                 with our future direction of having a unique type for
>>                 all pointers. (Hal please correct me if I
>>                 misunderstood your comment)
>>
>>                 This is a much simpler fix compared to alternatives.
>>                 (ignoring potential regressions)
>>
>>                 === ALTERNATIVE SOLUTION ==>>
>>                 Fix select speculation in SROA to see through
>>                 bitcasts. Handle remaining bitcasts during code gen.
>>                 Other alternative solutions are welcome.
>>
>>                 Should I implement the proposed solution or is it too
>>                 risky? I understand that we may need to undo it if it
>>                 breaks too many things. Comments are welcome.
>>
>>
>>                 [1]
>>                
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>                 r226781  git commit id: b778cbc0c8
>>
>>
>>
>>                 Code Samples (only relevant part is copied):
>>
>>                 -------------------- Before Canonicalization
>>                 (contains call to std::max): --------------------
>>                 entry:
>>                   %max_value = alloca float, align 4
>>                   %1 = load float, float* %input, align 4, !tbaa !1
>>                   store float %1, float* %max_value, align 4, !tbaa !1
>>
>>                 for.body:
>>                   %call = call dereferenceable(4) float*
>>                 @_ZSt3maxIfERKT_S2_S2_(float* dereferenceable(4)
>>                 %max_value, float* dereferenceable(4) %arrayidx1)
>>                   %3 = load float, float* %call, align 4, !tbaa !1
>>                   store float %3, float* %max_value, align 4, !tbaa !1
>>
>>                 -------------------- After Canonicalization (contains
>>                 call to std::max):--------------------
>>
>>                 entry:
>>                   %max_value = alloca float, align 4
>>                   %1 = bitcast float* %input to i32*
>>                   %2 = load i32, i32* %1, align 4, !tbaa !1
>>                   %3 = bitcast float* %max_value to i32*
>>                   store i32 %2, i32* %3, align 4, !tbaa !1
>>
>>                 for.body:
>>                   %call = call dereferenceable(4) float*
>>                 @_ZSt3maxIfERKT_S2_S2_(float* nonnull
>>                 dereferenceable(4) %max_value, float*
>>                 dereferenceable(4) %arrayidx1)
>>                   %5 = bitcast float* %call to i32*
>>                   %6 = load i32, i32* %5, align 4, !tbaa !1
>>                   %7 = bitcast float* %max_value to i32*
>>                   store i32 %6, i32* %7, align 4, !tbaa !1
>>
>>                 -------------------- After SROA (the call to std::max
>>                 is inlined now):--------------------
>>                 entry:
>>                 %max_value.sroa.0 = alloca i32
>>                   %0 = bitcast float* %input to i32*
>>                   %1 = load i32, i32* %0, align 4, !tbaa !1
>>                   store i32 %1, i32* %max_value.sroa.0
>>
>>                 for.body:
>>                 %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32,
>>                 i32* %max_value.sroa.0
>>                   %3 = bitcast i32
>>                 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>>                 %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast
>>                 i32* %max_value.sroa.0 to float*
>>                   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1,
>>                 float* %max_value.sroa.0.0.max_value.sroa_cast8
>>                   %5 = bitcast float* %__b.__a.i to i32*
>>                   %6 = load i32, i32* %5, align 4, !tbaa !1
>>                   store i32 %6, i32* %max_value.sroa.0
>>
>>                 -------------------- After SROA when Canonicalization
>>                 is turned off--------------------
>>                 entry:
>>                   %0 = load float, float* %input, align 4, !tbaa !1
>>
>>                 for.cond: ; preds = %for.body, %entry
>>                   %max_value.0 = phi float [ %0, %entry ], [
>>                 %.sroa.speculated, %for.body ]
>>
>>                 for.body:
>>                   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>                   %cmp.i = fcmp olt float %max_value.0, %1
>>                 %.sroa.speculate.load.true = load float, float*
>>                 %arrayidx1, align 4, !tbaa !1
>>                 %.sroa.speculated = select i1 %cmp.i, float
>>                 %.sroa.speculate.load.true, float %max_value.0
>>                 _______________________________________________
>>                 LLVM Developers mailing list
>>                 llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>                 http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>                 _______________________________________________
>                 LLVM Developers mailing list
>                 llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>                 http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/5b7f9a3d/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-22 19:16 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

I don't know enough about the tradeoff for 1, but 2 seems like a bandaid for
something that is not a correctness issue neither a regression. I'm not sure
it justifies "bandaid patches" while there is a clear path forward,
i.e. typeless pointers, unless there is an acknowledgement that typeless
pointers won't be there before a couple of years.

-- 
Mehdi
> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
> 
> Back to the discussion on the RFC, I still see some advantage in following
the proposed solution. I see two paths forward:
> 
> 1- Change canonical form, possibly lower memcpy to non-integer load and
store in InstCombine. Then teach the backends to convert that to integer load
and store if that is more profitable. Notice that we are talking about loads
that have no use other than store. So it is a fairly simple change in the
backends.
> 
> 2- Do not change the canonical form. Then for this bug, we need to teach
select speculation to see through bitcasts. We will probably need to teach other
optimizations to see though bitcasts in the future as problems are uncovered.
That is until typeless pointer work is complete. Once the typeless pointer work
is complete, we have some extra code in each optimization for seeing through
bitcasts which is possibly no longer needed.
> 
> Based on this I think (1) is the right thing to do. But there might be
other reasons for the current canonical form that I am not aware of. Please let
me know what you think.
> 
> Thanks
> Ehsan
> 
> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie <dblaikie at gmail.com
<mailto:dblaikie at gmail.com>> wrote:
> 
> 
> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <ehsanamiri at gmail.com
<mailto:ehsanamiri at gmail.com>> wrote:
> David,
> 
> Could you give us an update on the status of typeless pointer work? How
much work is left and when you think it might be ready?
> 
> It's a bit of an onion peel, really - since it will eventually involve
generalizing/fixing every optimization that's currently leaning on typed
pointers to keep the performance while removing the crutch they're currently
leaning on. (in cases where bitcasts are literally just getting in the way,
those won't require cleaning up & should just become "free
performance wins" once we remove them, though)
> 
> At the moment we can roundtrip every LLVM IR test case through bitcode and
textual IR (reading/writing both formats) while using only a narrow whitelist of
places that request the type of a pointer (things like the verifier, the
parser/printer where it actually needs the typed pointer to verify it matches
the explicit type, etc).
> 
> The next thing on the list is probably figuring out the byval/inalloca
representation (add an explicit pointee type? just make the number of bytes
explicit with no type information?).
> 
> Then start migrating optimizations over - doing the same sort of testing I
did for the IR/bitcode roundtripping - assert that the pointee type is not
accessed, whitelist places that need it until the bitcasts go away, fix anything
else... it'll still be a fair bit of work & I don't really know how
much. It should parallelize pretty well (doing any of this work is really
helpful, each optimization is indepednent, etc) if anyone wants to/is able to
help.
> 
> - Dave
>  
> 
> Thanks
> Ehsan
> 
> 
> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie <dblaikie at gmail.com
<mailto:dblaikie at gmail.com>> wrote:
> 
> 
> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi,
> 
> How do it interact with the "typeless pointers" work?
> 
> Right - the goal of the typeless pointer work is to fix all these bugs
related to "didn't look through bitcasts" in optimizations.
Sometimes that's going to mean more work (because the code is leaning on the
absence of bitcasts & the presence of convenient (but not guaranteed) type
information to inform optimization decisions) but if we remove typed pointers
while keeping optimization quality in the cases we have today, then we
should've also fixed the cases that were broken because the type information
didn't end up aligning to produce the optimal output.
> 
> & I know I've been off the typeless pointer stuff for a few months
working on llvm-dwp - but happy for any help (the next immediate piece is
probably figuring out teh right representation for byval and inalloca - there
were some code reviews sent out for that that I'll need to come back around
to - but also any optimizations people want to help rework/improve would be
great too & I can provide some techniques/tools to help people approach
those)
> 
> - Dave
>  
> 
> Thanks,
> 
> -- 
> Mehdi
> 
>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> === PROBLEM === (See this bug
https://llvm.org/bugs/show_bug.cgi?id=26445
<https://llvm.org/bugs/show_bug.cgi?id=26445>)
>> 
>> IR contains code for loading a float from float * and storing it to a
float * address. After canonicalization of load in InstCombine [1], new bitcasts
are added to the IR (see bottom of the email for code samples). This prevents
select speculation in SROA to work. Also after SROA we have bitcasts from int32
to float. (Whereas originally after instCombine, bitcasts are only done on
pointer types).
>> 
>> === PROPOSED SOLUTION==>> 
>> [1] implies that we need load canonicalization when we load a value
only to store it again. The reason is to avoid generating slightly different
code (due to different ways of adding bitcasts), in different situations. In all
examples presented in [1] there is a non-zero number of bitcasts. I think when
we load a value of type T from a T* address and store it as a type T value to
one or more T* address (and there is no other use or store), we can redefine
canonical form to mean there should not be any bitcasts. So we still have a
canonical form, but its definition is slightly different.
>> 
>> === REASONS FOR / AGAINST==>> 
>> Hal Finkel warns that while this may be useful for power pc, this may
hurt more than one other platform and become a very large project. Despite this
he is fine with bringing up the issue to the mailing list to get feedback,
mostly because this seems inline with our future direction of having a unique
type for all pointers.  (Hal please correct me if I misunderstood your comment)
>> 
>> This is a much simpler fix compared to alternatives. (ignoring
potential regressions)
>> 
>> === ALTERNATIVE SOLUTION ==>> 
>> Fix select speculation in SROA to see through bitcasts. Handle
remaining bitcasts during code gen. Other alternative solutions are welcome.
>> 
>> Should I implement the proposed solution or is it too risky? I
understand that we may need to undo it if it breaks too many things. Comments
are welcome.
>> 
>> 
>> [1] http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html> 
r226781  git commit id: b778cbc0c8
>> 
>> 
>> 
>> Code Samples (only relevant part is copied):
>> 
>> --------------------  Before Canonicalization (contains call to
std::max): --------------------
>> entry:
>>   %max_value = alloca float, align 4
>>   %1 = load float, float* %input, align 4, !tbaa !1
>>   store float %1, float* %max_value, align 4, !tbaa !1
>> 
>> for.body:
>>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>>   %3 = load float, float* %call, align 4, !tbaa !1
>>   store float %3, float* %max_value, align 4, !tbaa !1
>> 
>> --------------------  After Canonicalization (contains call to
std::max):--------------------
>> 
>> entry:
>>   %max_value = alloca float, align 4
>>   %1 = bitcast float* %input to i32*
>>   %2 = load i32, i32* %1, align 4, !tbaa !1
>>   %3 = bitcast float* %max_value to i32*
>>   store i32 %2, i32* %3, align 4, !tbaa !1
>> 
>> for.body:
>>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
nonnull dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>>   %5 = bitcast float* %call to i32*
>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>   %7 = bitcast float* %max_value to i32*
>>   store i32 %6, i32* %7, align 4, !tbaa !1
>> 
>> -------------------- After SROA (the call to std::max is inlined
now):--------------------
>> entry:
>>   %max_value.sroa.0 = alloca i32
>>   %0 = bitcast float* %input to i32*
>>   %1 = load i32, i32* %0, align 4, !tbaa !1
>>   store i32 %1, i32* %max_value.sroa.0
>> 
>> for.body: 
>>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
%max_value.sroa.0
>>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
%max_value.sroa.0 to float*
>>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
%max_value.sroa.0.0.max_value.sroa_cast8
>>   %5 = bitcast float* %__b.__a.i to i32*
>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>   store i32 %6, i32* %max_value.sroa.0
>> 
>> -------------------- After SROA when Canonicalization is turned
off--------------------
>> entry:
>>   %0 = load float, float* %input, align 4, !tbaa !1
>> 
>> for.cond:                                         ; preds = %for.body,
%entry
>>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated,
%for.body ]
>> 
>> for.body:
>>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>   %cmp.i = fcmp olt float %max_value.0, %1
>>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
!tbaa !1
>>   %.sroa.speculated = select i1 %cmp.i, float
%.sroa.speculate.load.true, float %max_value.0
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/be7effa1/attachment-0001.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Mar 2016 - RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

Apparently Analagous Threads