thr3ads.net - llvm dev - [llvm-dev] RFC: A change in InstCombine canonical form [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Mar-22 19:39 UTC

[llvm-dev] RFC: A change in InstCombine canonical form

I don't really mind, but the intermediate stage will not be very nice: that
a lot of code / tests that needs to be written with bitcast, and all of that
while they are deemed to disappear. The added value isn't clear to me
considering the added work. I'm not sure it wouldn't add more work for
all the cleanup required by the "typeless pointer", but I'm not
sure what's involved here and if David thinks the intermediate steps of
handling bit casts everywhere is not making it harder I'm fine with it.

-- 
Mehdi
> On Mar 22, 2016, at 12:36 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> I'd phrase this differently: being pointer-bitcast agnostic is a step
towards support typeless pointers.  :)  We can either become bitcast agnostic
all in one big change or incrementally.  Personally, I'd prefer the later
since it reduces the risk associated with enabling typeless pointers in the end.
> 
> Philip
> 
> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>> I don't know enough about the tradeoff for 1, but 2 seems like a
bandaid for something that is not a correctness issue neither a regression.
I'm not sure it justifies "bandaid patches" while there is a clear
path forward, i.e. typeless pointers, unless there is an acknowledgement that
typeless pointers won't be there before a couple of years.
>> 
>> -- 
>> Mehdi
>> 
>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at
gmail.com <mailto:ehsanamiri at gmail.com>> wrote:
>>> 
>>> Back to the discussion on the RFC, I still see some advantage in
following the proposed solution. I see two paths forward:
>>> 
>>> 1- Change canonical form, possibly lower memcpy to non-integer load
and store in InstCombine. Then teach the backends to convert that to integer
load and store if that is more profitable. Notice that we are talking about
loads that have no use other than store. So it is a fairly simple change in the
backends.
>>> 
>>> 2- Do not change the canonical form. Then for this bug, we need to
teach select speculation to see through bitcasts. We will probably need to teach
other optimizations to see though bitcasts in the future as problems are
uncovered. That is until typeless pointer work is complete. Once the typeless
pointer work is complete, we have some extra code in each optimization for
seeing through bitcasts which is possibly no longer needed.
>>> 
>>> Based on this I think (1) is the right thing to do. But there might
be other reasons for the current canonical form that I am not aware of. Please
let me know what you think.
>>> 
>>> Thanks
>>> Ehsan
>>> 
>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie <
<mailto:dblaikie at gmail.com>dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>> 
>>> 
>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <
<mailto:ehsanamiri at gmail.com>ehsanamiri at gmail.com
<mailto:ehsanamiri at gmail.com>> wrote:
>>> David,
>>> 
>>> Could you give us an update on the status of typeless pointer work?
How much work is left and when you think it might be ready?
>>> 
>>> It's a bit of an onion peel, really - since it will eventually
involve generalizing/fixing every optimization that's currently leaning on
typed pointers to keep the performance while removing the crutch they're
currently leaning on. (in cases where bitcasts are literally just getting in the
way, those won't require cleaning up & should just become "free
performance wins" once we remove them, though)
>>> 
>>> At the moment we can roundtrip every LLVM IR test case through
bitcode and textual IR (reading/writing both formats) while using only a narrow
whitelist of places that request the type of a pointer (things like the
verifier, the parser/printer where it actually needs the typed pointer to verify
it matches the explicit type, etc).
>>> 
>>> The next thing on the list is probably figuring out the
byval/inalloca representation (add an explicit pointee type? just make the
number of bytes explicit with no type information?).
>>> 
>>> Then start migrating optimizations over - doing the same sort of
testing I did for the IR/bitcode roundtripping - assert that the pointee type is
not accessed, whitelist places that need it until the bitcasts go away, fix
anything else... it'll still be a fair bit of work & I don't really
know how much. It should parallelize pretty well (doing any of this work is
really helpful, each optimization is indepednent, etc) if anyone wants to/is
able to help.
>>> 
>>> - Dave
>>>  
>>> 
>>> Thanks
>>> Ehsan
>>> 
>>> 
>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie <
<mailto:dblaikie at gmail.com>dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>> 
>>> 
>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
>>> Hi,
>>> 
>>> How do it interact with the "typeless pointers" work?
>>> 
>>> Right - the goal of the typeless pointer work is to fix all these
bugs related to "didn't look through bitcasts" in optimizations.
Sometimes that's going to mean more work (because the code is leaning on the
absence of bitcasts & the presence of convenient (but not guaranteed) type
information to inform optimization decisions) but if we remove typed pointers
while keeping optimization quality in the cases we have today, then we
should've also fixed the cases that were broken because the type information
didn't end up aligning to produce the optimal output.
>>> 
>>> & I know I've been off the typeless pointer stuff for a few
months working on llvm-dwp - but happy for any help (the next immediate piece is
probably figuring out teh right representation for byval and inalloca - there
were some code reviews sent out for that that I'll need to come back around
to - but also any optimizations people want to help rework/improve would be
great too & I can provide some techniques/tools to help people approach
those)
>>> 
>>> - Dave
>>>  
>>> 
>>> Thanks,
>>> 
>>> -- 
>>> Mehdi
>>> 
>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
>>>> 
>>>> === PROBLEM === (See this bug 
<https://llvm.org/bugs/show_bug.cgi?id=26445>https://llvm.org/bugs/show_bug.cgi?id=26445
<https://llvm.org/bugs/show_bug.cgi?id=26445>)
>>>> 
>>>> IR contains code for loading a float from float * and storing
it to a float * address. After canonicalization of load in InstCombine [1], new
bitcasts are added to the IR (see bottom of the email for code samples). This
prevents select speculation in SROA to work. Also after SROA we have bitcasts
from int32 to float. (Whereas originally after instCombine, bitcasts are only
done on pointer types).
>>>> 
>>>> === PROPOSED SOLUTION==>>>> 
>>>> [1] implies that we need load canonicalization when we load a
value only to store it again. The reason is to avoid generating slightly
different code (due to different ways of adding bitcasts), in different
situations. In all examples presented in [1] there is a non-zero number of
bitcasts. I think when we load a value of type T from a T* address and store it
as a type T value to one or more T* address (and there is no other use or
store), we can redefine canonical form to mean there should not be any bitcasts.
So we still have a canonical form, but its definition is slightly different.
>>>> 
>>>> === REASONS FOR / AGAINST==>>>> 
>>>> Hal Finkel warns that while this may be useful for power pc,
this may hurt more than one other platform and become a very large project.
Despite this he is fine with bringing up the issue to the mailing list to get
feedback, mostly because this seems inline with our future direction of having a
unique type for all pointers.  (Hal please correct me if I misunderstood your
comment)
>>>> 
>>>> This is a much simpler fix compared to alternatives. (ignoring
potential regressions)
>>>> 
>>>> === ALTERNATIVE SOLUTION ==>>>> 
>>>> Fix select speculation in SROA to see through bitcasts. Handle
remaining bitcasts during code gen. Other alternative solutions are welcome.
>>>> 
>>>> Should I implement the proposed solution or is it too risky? I
understand that we may need to undo it if it breaks too many things. Comments
are welcome.
>>>> 
>>>> 
>>>> [1] 
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html>http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html> 
r226781  git commit id: b778cbc0c8
>>>> 
>>>> 
>>>> 
>>>> Code Samples (only relevant part is copied):
>>>> 
>>>> --------------------  Before Canonicalization (contains call to
std::max): --------------------
>>>> entry:
>>>>   %max_value = alloca float, align 4
>>>>   %1 = load float, float* %input, align 4, !tbaa !1
>>>>   store float %1, float* %max_value, align 4, !tbaa !1
>>>> 
>>>> for.body:
>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float* dereferenceable(4) %max_value, float*
dereferenceable(4) %arrayidx1)
>>>>   %3 = load float, float* %call, align 4, !tbaa !1
>>>>   store float %3, float* %max_value, align 4, !tbaa !1
>>>> 
>>>> --------------------  After Canonicalization (contains call to
std::max):--------------------
>>>> 
>>>> entry:
>>>>   %max_value = alloca float, align 4
>>>>   %1 = bitcast float* %input to i32*
>>>>   %2 = load i32, i32* %1, align 4, !tbaa !1
>>>>   %3 = bitcast float* %max_value to i32*
>>>>   store i32 %2, i32* %3, align 4, !tbaa !1
>>>> 
>>>> for.body:
>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %max_value, float*
dereferenceable(4) %arrayidx1)
>>>>   %5 = bitcast float* %call to i32*
>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>   %7 = bitcast float* %max_value to i32*
>>>>   store i32 %6, i32* %7, align 4, !tbaa !1
>>>> 
>>>> -------------------- After SROA (the call to std::max is
inlined now):--------------------
>>>> entry:
>>>>   %max_value.sroa.0 = alloca i32
>>>>   %0 = bitcast float* %input to i32*
>>>>   %1 = load i32, i32* %0, align 4, !tbaa !1
>>>>   store i32 %1, i32* %max_value.sroa.0
>>>> 
>>>> for.body: 
>>>>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
%max_value.sroa.0
>>>>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to
float
>>>>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
%max_value.sroa.0 to float*
>>>>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
%max_value.sroa.0.0.max_value.sroa_cast8
>>>>   %5 = bitcast float* %__b.__a.i to i32*
>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>   store i32 %6, i32* %max_value.sroa.0
>>>> 
>>>> -------------------- After SROA when Canonicalization is turned
off--------------------
>>>> entry:
>>>>   %0 = load float, float* %input, align 4, !tbaa !1
>>>> 
>>>> for.cond:                                         ; preds =
%for.body, %entry
>>>>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated,
%for.body ]
>>>> 
>>>> for.body:
>>>>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>>>   %cmp.i = fcmp olt float %max_value.0, %1
>>>>   %.sroa.speculate.load.true = load float, float* %arrayidx1,
align 4, !tbaa !1
>>>>   %.sroa.speculated = select i1 %cmp.i, float
%.sroa.speculate.load.true, float %max_value.0
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>>  <mailto:llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> 
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>>  <mailto:llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> 
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/a91ec960/attachment-0001.html>

Hal Finkel via llvm-dev

2016-Mar-22 19:47 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

----- Original Message -----
> From: "Mehdi Amini via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Philip Reames" <listmail at philipreames.com>,
"David Blaikie"
> <dblaikie at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, uweigand at
de.ibm.com, "Tom
> Stellard" <thomas.stellard at amd.com>
> Sent: Tuesday, March 22, 2016 2:39:36 PM
> Subject: Re: [llvm-dev] RFC: A change in InstCombine canonical form
> I don't really mind, but the intermediate stage will not be very
> nice: that a lot of code / tests that needs to be written with
> bitcast, and all of that while they are deemed to disappear. The
> added value isn't clear to me considering the added work. I'm not
> sure it wouldn't add more work for all the cleanup required by the
> "typeless pointer", but I'm not sure what's involved here
and if
> David thinks the intermediate steps of handling bit casts everywhere
> is not making it harder I'm fine with it.It is not clear to me that this is a particularly-large change. As I understand
it, we're only talking about the canonicalization of small memcpys, so there
are no other uses of the relevant values. Changing the canonical form here seems
like a pure win, except for the fact that doing the load and store using integer
registers is likely more efficient on some architectures than using
floating-point registers. For those architectures, we'd want to fix this in
the backend. This seems relatively easy to do in CGP (or DAGCombine) (and, in
fact, doing it later would enable us to catch more load/store cases than just
those that comes from memcpy expansions). Thoughts?

-Hal 
> --
> Mehdi
> > On Mar 22, 2016, at 12:36 PM, Philip Reames <
> > listmail at philipreames.com > wrote:
> 
> > I'd phrase this differently: being pointer-bitcast agnostic is a
> > step
> > towards support typeless pointers. :) We can either become bitcast
> > agnostic all in one big change or incrementally. Personally, I'd
> > prefer the later since it reduces the risk associated with enabling
> > typeless pointers in the end.
> 
> > Philip
> 
> > On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
> 
> > > I don't know enough about the tradeoff for 1, but 2 seems
like a
> > > bandaid for something that is not a correctness issue neither a
> > > regression. I'm not sure it justifies "bandaid
patches" while
> > > there
> > > is a clear path forward, i.e. typeless pointers, unless there is
> > > an
> > > acknowledgement that typeless pointers won't be there before
a
> > > couple of years.
> > 
> 
> > > --
> > 
> 
> > > Mehdi
> > 
> 
> > > > On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <
> > > > ehsanamiri at gmail.com
> > > > >
> > > > wrote:
> > > 
> > 
> 
> > > > Back to the discussion on the RFC, I still see some
advantage
> > > > in
> > > > following the proposed solution. I see two paths forward:
> > > 
> > 
> 
> > > > 1- Change canonical form, possibly lower memcpy to
non-integer
> > > > load
> > > > and store in InstCombine. Then teach the backends to convert
> > > > that
> > > > to
> > > > integer load and store if that is more profitable. Notice
that
> > > > we
> > > > are talking about loads that have no use other than store.
So
> > > > it
> > > > is
> > > > a fairly simple change in the backends.
> > > 
> > 
> 
> > > > 2- Do not change the canonical form. Then for this bug, we
need
> > > > to
> > > > teach select speculation to see through bitcasts. We will
> > > > probably
> > > > need to teach other optimizations to see though bitcasts in
the
> > > > future as problems are uncovered. That is until typeless
> > > > pointer
> > > > work is complete. Once the typeless pointer work is
complete,
> > > > we
> > > > have some extra code in each optimization for seeing through
> > > > bitcasts which is possibly no longer needed.
> > > 
> > 
> 
> > > > Based on this I think (1) is the right thing to do. But
there
> > > > might
> > > > be other reasons for the current canonical form that I am
not
> > > > aware
> > > > of. Please let me know what you think.
> > > 
> > 
> 
> > > > Thanks
> > > 
> > 
> 
> > > > Ehsan
> > > 
> > 
> 
> > > > On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie <
> > > > dblaikie at gmail.com
> > > > >
> > > > wrote:
> > > 
> > 
> 
> > > > > On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri <
> > > > > ehsanamiri at gmail.com
> > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > David,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Could you give us an update on the status of
typeless
> > > > > > pointer
> > > > > > work?
> > > > > > How much work is left and when you think it might
be ready?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > It's a bit of an onion peel, really - since it will
> > > > > eventually
> > > > > involve generalizing/fixing every optimization
that's
> > > > > currently
> > > > > leaning on typed pointers to keep the performance while
> > > > > removing
> > > > > the
> > > > > crutch they're currently leaning on. (in cases
where bitcasts
> > > > > are
> > > > > literally just getting in the way, those won't
require
> > > > > cleaning
> > > > > up
> > > > > &
> > > > > should just become "free performance wins"
once we remove
> > > > > them,
> > > > > though)
> > > > 
> > > 
> > 
> 
> > > > > At the moment we can roundtrip every LLVM IR test case
> > > > > through
> > > > > bitcode and textual IR (reading/writing both formats)
while
> > > > > using
> > > > > only a narrow whitelist of places that request the type
of a
> > > > > pointer
> > > > > (things like the verifier, the parser/printer where it
> > > > > actually
> > > > > needs the typed pointer to verify it matches the
explicit
> > > > > type,
> > > > > etc).
> > > > 
> > > 
> > 
> 
> > > > > The next thing on the list is probably figuring out the
> > > > > byval/inalloca representation (add an explicit pointee
type?
> > > > > just
> > > > > make the number of bytes explicit with no type
information?).
> > > > 
> > > 
> > 
> 
> > > > > Then start migrating optimizations over - doing the
same sort
> > > > > of
> > > > > testing I did for the IR/bitcode roundtripping - assert
that
> > > > > the
> > > > > pointee type is not accessed, whitelist places that
need it
> > > > > until
> > > > > the bitcasts go away, fix anything else... it'll
still be a
> > > > > fair
> > > > > bit
> > > > > of work & I don't really know how much. It
should parallelize
> > > > > pretty
> > > > > well (doing any of this work is really helpful, each
> > > > > optimization
> > > > > is
> > > > > indepednent, etc) if anyone wants to/is able to help.
> > > > 
> > > 
> > 
> 
> > > > > - Dave
> > > > 
> > > 
> > 
> 
> > > > > > Thanks
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Ehsan
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie
<
> > > > > > dblaikie at gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini
via llvm-dev
> > > > > > > <
> > > > > > > llvm-dev at lists.llvm.org > wrote:
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Hi,
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > How do it interact with the
"typeless pointers" work?
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Right - the goal of the typeless pointer work
is to fix
> > > > > > > all
> > > > > > > these
> > > > > > > bugs related to "didn't look through
bitcasts" in
> > > > > > > optimizations.
> > > > > > > Sometimes that's going to mean more work
(because the
> > > > > > > code
> > > > > > > is
> > > > > > > leaning on the absence of bitcasts & the
presence of
> > > > > > > convenient
> > > > > > > (but
> > > > > > > not guaranteed) type information to inform
optimization
> > > > > > > decisions)
> > > > > > > but if we remove typed pointers while keeping
> > > > > > > optimization
> > > > > > > quality
> > > > > > > in the cases we have today, then we
should've also fixed
> > > > > > > the
> > > > > > > cases
> > > > > > > that were broken because the type information
didn't end
> > > > > > > up
> > > > > > > aligning
> > > > > > > to produce the optimal output.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > & I know I've been off the typeless
pointer stuff for a
> > > > > > > few
> > > > > > > months
> > > > > > > working on llvm-dwp - but happy for any help
(the next
> > > > > > > immediate
> > > > > > > piece is probably figuring out teh right
representation
> > > > > > > for
> > > > > > > byval
> > > > > > > and inalloca - there were some code reviews
sent out for
> > > > > > > that
> > > > > > > that
> > > > > > > I'll need to come back around to - but
also any
> > > > > > > optimizations
> > > > > > > people
> > > > > > > want to help rework/improve would be great
too & I can
> > > > > > > provide
> > > > > > > some
> > > > > > > techniques/tools to help people approach
those)
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > - Dave
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Thanks,
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > --
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Mehdi
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > On Mar 16, 2016, at 6:41 AM, Ehsan
Amiri via llvm-dev
> > > > > > > > > <
> > > > > > > > > llvm-dev at lists.llvm.org >
wrote:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > === PROBLEM === (See this bug
> > > > > > > > >
https://llvm.org/bugs/show_bug.cgi?id=26445 )
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > IR contains code for loading a
float from float * and
> > > > > > > > > storing
> > > > > > > > > it
> > > > > > > > > to
> > > > > > > > > a
> > > > > > > > > float * address. After
canonicalization of load in
> > > > > > > > > InstCombine
> > > > > > > > > [1],
> > > > > > > > > new bitcasts are added to the IR
(see bottom of the
> > > > > > > > > email
> > > > > > > > > for
> > > > > > > > > code
> > > > > > > > > samples). This prevents select
speculation in SROA to
> > > > > > > > > work.
> > > > > > > > > Also
> > > > > > > > > after SROA we have bitcasts from
int32 to float.
> > > > > > > > > (Whereas
> > > > > > > > > originally
> > > > > > > > > after instCombine, bitcasts are
only done on pointer
> > > > > > > > > types).
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > === PROPOSED SOLUTION==> >
> > > > > >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > [1] implies that we need load
canonicalization when
> > > > > > > > > we
> > > > > > > > > load
> > > > > > > > > a
> > > > > > > > > value
> > > > > > > > > only to store it again. The reason
is to avoid
> > > > > > > > > generating
> > > > > > > > > slightly
> > > > > > > > > different code (due to different
ways of adding
> > > > > > > > > bitcasts),
> > > > > > > > > in
> > > > > > > > > different situations. In all
examples presented in
> > > > > > > > > [1]
> > > > > > > > > there
> > > > > > > > > is
> > > > > > > > > a
> > > > > > > > > non-zero number of bitcasts. I
think when we load a
> > > > > > > > > value
> > > > > > > > > of
> > > > > > > > > type
> > > > > > > > > T
> > > > > > > > > from a T* address and store it as a
type T value to
> > > > > > > > > one
> > > > > > > > > or
> > > > > > > > > more
> > > > > > > > > T*
> > > > > > > > > address (and there is no other use
or store), we can
> > > > > > > > > redefine
> > > > > > > > > canonical form to mean there should
not be any
> > > > > > > > > bitcasts.
> > > > > > > > > So
> > > > > > > > > we
> > > > > > > > > still
> > > > > > > > > have a canonical form, but its
definition is slightly
> > > > > > > > > different.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > === REASONS FOR / AGAINST==>
> > > > > > >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Hal Finkel warns that while this
may be useful for
> > > > > > > > > power
> > > > > > > > > pc,
> > > > > > > > > this
> > > > > > > > > may
> > > > > > > > > hurt more than one other platform
and become a very
> > > > > > > > > large
> > > > > > > > > project.
> > > > > > > > > Despite this he is fine with
bringing up the issue to
> > > > > > > > > the
> > > > > > > > > mailing
> > > > > > > > > list to get feedback, mostly
because this seems
> > > > > > > > > inline
> > > > > > > > > with
> > > > > > > > > our
> > > > > > > > > future direction of having a unique
type for all
> > > > > > > > > pointers.
> > > > > > > > > (Hal
> > > > > > > > > please correct me if I
misunderstood your comment)
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > This is a much simpler fix compared
to alternatives.
> > > > > > > > > (ignoring
> > > > > > > > > potential regressions)
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > === ALTERNATIVE SOLUTION ==>
> > > > > > >
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Fix select speculation in SROA to
see through
> > > > > > > > > bitcasts.
> > > > > > > > > Handle
> > > > > > > > > remaining bitcasts during code gen.
Other alternative
> > > > > > > > > solutions
> > > > > > > > > are
> > > > > > > > > welcome.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Should I implement the proposed
solution or is it too
> > > > > > > > > risky?
> > > > > > > > > I
> > > > > > > > > understand that we may need to undo
it if it breaks
> > > > > > > > > too
> > > > > > > > > many
> > > > > > > > > things.
> > > > > > > > > Comments are welcome.
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > [1]
> > > > > > > > >
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
> > > > > > > > > r226781 git commit id: b778cbc0c8
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > Code Samples (only relevant part is
copied):
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > -------------------- Before
Canonicalization
> > > > > > > > > (contains
> > > > > > > > > call
> > > > > > > > > to
> > > > > > > > > std::max): --------------------
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > entry:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %max_value = alloca float, align 4
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %1 = load float, float* %input,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store float %1, float* %max_value,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for.body:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %call = call dereferenceable(4)
float*
> > > > > > > > > @_ZSt3maxIfERKT_S2_S2_(float*
> > > > > > > > > dereferenceable(4) %max_value,
float*
> > > > > > > > > dereferenceable(4)
> > > > > > > > > %arrayidx1)
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %3 = load float, float* %call,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store float %3, float* %max_value,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > -------------------- After
Canonicalization (contains
> > > > > > > > > call
> > > > > > > > > to
> > > > > > > > > std::max):--------------------
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > entry:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %max_value = alloca float, align 4
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %1 = bitcast float* %input to i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %2 = load i32, i32* %1, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %3 = bitcast float* %max_value to
i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store i32 %2, i32* %3, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for.body:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %call = call dereferenceable(4)
float*
> > > > > > > > > @_ZSt3maxIfERKT_S2_S2_(float*
> > > > > > > > > nonnull dereferenceable(4)
%max_value, float*
> > > > > > > > > dereferenceable(4)
> > > > > > > > > %arrayidx1)
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %5 = bitcast float* %call to i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %6 = load i32, i32* %5, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %7 = bitcast float* %max_value to
i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store i32 %6, i32* %7, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > -------------------- After SROA
(the call to std::max
> > > > > > > > > is
> > > > > > > > > inlined
> > > > > > > > > now):--------------------
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > entry:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %max_value.sroa.0 = alloca i32
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %0 = bitcast float* %input to i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %1 = load i32, i32* %0, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store i32 %1, i32*
%max_value.sroa.0
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for.body:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
%max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32,
> > > > > > > > > i32*
> > > > > > > > > %max_value.sroa.0
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %3 = bitcast i32
> > > > > > > > >
%max_value.sroa.0.0.max_value.sroa.0.0.6
> > > > > > > > > to
> > > > > > > > > float
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
%max_value.sroa.0.0.max_value.sroa_cast8 = bitcast
> > > > > > > > > i32*
> > > > > > > > > %max_value.sroa.0 to float*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %__b.__a.i = select i1 %cmp.i,
float* %arrayidx1,
> > > > > > > > > float*
> > > > > > > > >
%max_value.sroa.0.0.max_value.sroa_cast8
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %5 = bitcast float* %__b.__a.i to
i32*
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %6 = load i32, i32* %5, align 4,
!tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > store i32 %6, i32*
%max_value.sroa.0
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > -------------------- After SROA
when Canonicalization
> > > > > > > > > is
> > > > > > > > > turned
> > > > > > > > > off--------------------
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > entry:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %0 = load float, float* %input,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for.cond: ; preds = %for.body,
%entry
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %max_value.0 = phi float [ %0,
%entry ], [
> > > > > > > > > %.sroa.speculated,
> > > > > > > > > %for.body ]
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > for.body:
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %1 = load float, float* %arrayidx1,
align 4, !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %cmp.i = fcmp olt float
%max_value.0, %1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %.sroa.speculate.load.true = load
float, float*
> > > > > > > > > %arrayidx1,
> > > > > > > > > align
> > > > > > > > > 4,
> > > > > > > > > !tbaa !1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > %.sroa.speculated = select i1
%cmp.i, float
> > > > > > > > > %.sroa.speculate.load.true, float
%max_value.0
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
_______________________________________________
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > LLVM Developers mailing list
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > llvm-dev at lists.llvm.org
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
_______________________________________________
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > LLVM Developers mailing list
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > llvm-dev at lists.llvm.org
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/45920d2d/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-22 19:51 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

> On Mar 22, 2016, at 12:47 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> 
> From: "Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> To: "Philip Reames" <listmail at philipreames.com
<mailto:listmail at philipreames.com>>, "David Blaikie"
<dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>>, uweigand at de.ibm.com <mailto:uweigand at
de.ibm.com>, "Tom Stellard" <thomas.stellard at amd.com
<mailto:thomas.stellard at amd.com>>
> Sent: Tuesday, March 22, 2016 2:39:36 PM
> Subject: Re: [llvm-dev] RFC: A change in InstCombine canonical form
> 
> I don't really mind, but the intermediate stage will not be very nice:
that a lot of code / tests that needs to be written with bitcast, and all of
that while they are deemed to disappear. The added value isn't clear to me
considering the added work. I'm not sure it wouldn't add more work for
all the cleanup required by the "typeless pointer", but I'm not
sure what's involved here and if David thinks the intermediate steps of
handling bit casts everywhere is not making it harder I'm fine with it.
> It is not clear to me that this is a particularly-large change. As I
understand it, we're only talking about the canonicalization of small
memcpy,
What you describe is option 1, I have zero opinion on this since I don't
know about the tradeoff involved. Also option 1 is not obsolete by the typeless
pointers.
I was address the other option, which as I understood it, would be to accept the
current canonicalization and the implied bitcast, and then fix all the places
that don't look through it. This would be obsolete by typeless pointers.

-- 
Mehdi



> so there are no other uses of the relevant values. Changing the canonical
form here seems like a pure win, except for the fact that doing the load and
store using integer registers is likely more efficient on some architectures
than using floating-point registers. For those architectures, we'd want to
fix this in the backend. This seems relatively easy to do in CGP (or DAGCombine)
(and, in fact, doing it later would enable us to catch more load/store cases
than just those that comes from memcpy expansions). Thoughts?
> 
>  -Hal
> 
> -- 
> Mehdi
> 
> On Mar 22, 2016, at 12:36 PM, Philip Reames <listmail at
philipreames.com <mailto:listmail at philipreames.com>> wrote:
> 
> I'd phrase this differently: being pointer-bitcast agnostic is a step
towards support typeless pointers.  :)  We can either become bitcast agnostic
all in one big change or incrementally.  Personally, I'd prefer the later
since it reduces the risk associated with enabling typeless pointers in the end.
> 
> Philip
> 
> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
> I don't know enough about the tradeoff for 1, but 2 seems like a
bandaid for something that is not a correctness issue neither a regression.
I'm not sure it justifies "bandaid patches" while there is a clear
path forward, i.e. typeless pointers, unless there is an acknowledgement that
typeless pointers won't be there before a couple of years.
> 
> -- 
> Mehdi
> 
> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at gmail.com
<mailto:ehsanamiri at gmail.com>> wrote:
> 
> Back to the discussion on the RFC, I still see some advantage in following
the proposed solution. I see two paths forward:
> 
> 1- Change canonical form, possibly lower memcpy to non-integer load and
store in InstCombine. Then teach the backends to convert that to integer load
and store if that is more profitable. Notice that we are talking about loads
that have no use other than store. So it is a fairly simple change in the
backends.
> 
> 2- Do not change the canonical form. Then for this bug, we need to teach
select speculation to see through bitcasts. We will probably need to teach other
optimizations to see though bitcasts in the future as problems are uncovered.
That is until typeless pointer work is complete. Once the typeless pointer work
is complete, we have some extra code in each optimization for seeing through
bitcasts which is possibly no longer needed.
> 
> Based on this I think (1) is the right thing to do. But there might be
other reasons for the current canonical form that I am not aware of. Please let
me know what you think.
> 
> Thanks
> Ehsan
> 
> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie < <mailto:dblaikie at
gmail.com>dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
wrote:
> 
> 
> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri < <mailto:ehsanamiri at
gmail.com>ehsanamiri at gmail.com <mailto:ehsanamiri at gmail.com>>
wrote:
> David,
> 
> Could you give us an update on the status of typeless pointer work? How
much work is left and when you think it might be ready?
> 
> It's a bit of an onion peel, really - since it will eventually involve
generalizing/fixing every optimization that's currently leaning on typed
pointers to keep the performance while removing the crutch they're currently
leaning on. (in cases where bitcasts are literally just getting in the way,
those won't require cleaning up & should just become "free
performance wins" once we remove them, though)
> 
> At the moment we can roundtrip every LLVM IR test case through bitcode and
textual IR (reading/writing both formats) while using only a narrow whitelist of
places that request the type of a pointer (things like the verifier, the
parser/printer where it actually needs the typed pointer to verify it matches
the explicit type, etc).
> 
> The next thing on the list is probably figuring out the byval/inalloca
representation (add an explicit pointee type? just make the number of bytes
explicit with no type information?).
> 
> Then start migrating optimizations over - doing the same sort of testing I
did for the IR/bitcode roundtripping - assert that the pointee type is not
accessed, whitelist places that need it until the bitcasts go away, fix anything
else... it'll still be a fair bit of work & I don't really know how
much. It should parallelize pretty well (doing any of this work is really
helpful, each optimization is indepednent, etc) if anyone wants to/is able to
help.
> 
> - Dave
>  
> 
> Thanks
> Ehsan
> 
> 
> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie < <mailto:dblaikie at
gmail.com>dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
wrote:
> 
> 
> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
> Hi,
> 
> How do it interact with the "typeless pointers" work?
> 
> Right - the goal of the typeless pointer work is to fix all these bugs
related to "didn't look through bitcasts" in optimizations.
Sometimes that's going to mean more work (because the code is leaning on the
absence of bitcasts & the presence of convenient (but not guaranteed) type
information to inform optimization decisions) but if we remove typed pointers
while keeping optimization quality in the cases we have today, then we
should've also fixed the cases that were broken because the type information
didn't end up aligning to produce the optimal output.
> 
> & I know I've been off the typeless pointer stuff for a few months
working on llvm-dwp - but happy for any help (the next immediate piece is
probably figuring out teh right representation for byval and inalloca - there
were some code reviews sent out for that that I'll need to come back around
to - but also any optimizations people want to help rework/improve would be
great too & I can provide some techniques/tools to help people approach
those)
> 
> - Dave
>  
> 
> Thanks,
> 
> -- 
> Mehdi
> 
> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
<mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> === PROBLEM === (See this bug 
<https://llvm.org/bugs/show_bug.cgi?id=26445>https://llvm.org/bugs/show_bug.cgi?id=26445
<https://llvm.org/bugs/show_bug.cgi?id=26445>)
> 
> IR contains code for loading a float from float * and storing it to a float
* address. After canonicalization of load in InstCombine [1], new bitcasts are
added to the IR (see bottom of the email for code samples). This prevents select
speculation in SROA to work. Also after SROA we have bitcasts from int32 to
float. (Whereas originally after instCombine, bitcasts are only done on pointer
types).
> 
> === PROPOSED SOLUTION==> 
> [1] implies that we need load canonicalization when we load a value only to
store it again. The reason is to avoid generating slightly different code (due
to different ways of adding bitcasts), in different situations. In all examples
presented in [1] there is a non-zero number of bitcasts. I think when we load a
value of type T from a T* address and store it as a type T value to one or more
T* address (and there is no other use or store), we can redefine canonical form
to mean there should not be any bitcasts. So we still have a canonical form, but
its definition is slightly different.
> 
> === REASONS FOR / AGAINST==> 
> Hal Finkel warns that while this may be useful for power pc, this may hurt
more than one other platform and become a very large project. Despite this he is
fine with bringing up the issue to the mailing list to get feedback, mostly
because this seems inline with our future direction of having a unique type for
all pointers.  (Hal please correct me if I misunderstood your comment)
> 
> This is a much simpler fix compared to alternatives. (ignoring potential
regressions)
> 
> === ALTERNATIVE SOLUTION ==> 
> Fix select speculation in SROA to see through bitcasts. Handle remaining
bitcasts during code gen. Other alternative solutions are welcome.
> 
> Should I implement the proposed solution or is it too risky? I understand
that we may need to undo it if it breaks too many things. Comments are welcome.
> 
> 
> [1] 
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html>http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html> 
r226781  git commit id: b778cbc0c8
> 
> 
> 
> Code Samples (only relevant part is copied):
> 
> --------------------  Before Canonicalization (contains call to std::max):
--------------------
> entry:
>   %max_value = alloca float, align 4
>   %1 = load float, float* %input, align 4, !tbaa !1
>   store float %1, float* %max_value, align 4, !tbaa !1
> 
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %3 = load float, float* %call, align 4, !tbaa !1
>   store float %3, float* %max_value, align 4, !tbaa !1
> 
> --------------------  After Canonicalization (contains call to
std::max):--------------------
> 
> entry:
>   %max_value = alloca float, align 4
>   %1 = bitcast float* %input to i32*
>   %2 = load i32, i32* %1, align 4, !tbaa !1
>   %3 = bitcast float* %max_value to i32*
>   store i32 %2, i32* %3, align 4, !tbaa !1
> 
> for.body:
>   %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float*
nonnull dereferenceable(4) %max_value, float* dereferenceable(4) %arrayidx1)
>   %5 = bitcast float* %call to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   %7 = bitcast float* %max_value to i32*
>   store i32 %6, i32* %7, align 4, !tbaa !1
> 
> -------------------- After SROA (the call to std::max is inlined
now):--------------------
> entry:
>   %max_value.sroa.0 = alloca i32
>   %0 = bitcast float* %input to i32*
>   %1 = load i32, i32* %0, align 4, !tbaa !1
>   store i32 %1, i32* %max_value.sroa.0
> 
> for.body: 
>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
%max_value.sroa.0
>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32* %max_value.sroa.0
to float*
>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
%max_value.sroa.0.0.max_value.sroa_cast8
>   %5 = bitcast float* %__b.__a.i to i32*
>   %6 = load i32, i32* %5, align 4, !tbaa !1
>   store i32 %6, i32* %max_value.sroa.0
> 
> -------------------- After SROA when Canonicalization is turned
off--------------------
> entry:
>   %0 = load float, float* %input, align 4, !tbaa !1
> 
> for.cond:                                         ; preds = %for.body,
%entry
>   %max_value.0 = phi float [ %0, %entry ], [ %.sroa.speculated, %for.body ]
> 
> for.body:
>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>   %cmp.i = fcmp olt float %max_value.0, %1
>   %.sroa.speculate.load.true = load float, float* %arrayidx1, align 4,
!tbaa !1
>   %.sroa.speculated = select i1 %cmp.i, float %.sroa.speculate.load.true,
float %max_value.0
> _______________________________________________
> LLVM Developers mailing list
>  <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>
> 
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
>  <mailto:llvm-dev at lists.llvm.org>llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>
> 
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/7fc6459d/attachment-0001.html>

David Blaikie via llvm-dev

2016-Mar-22 20:09 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

Ultimately everything is going to be made to not rely on the types of
pointers - that's nearly equivalent to bitcast-ignorant (the difference
being that the presence of an extra instruction (the bitcast) might trip up
some optimizations - but the presence of the /type/ information implied by
the bitcast should not trip up or be necessary for optimizations (two sides
of the same coin))

If we're talking about making an optimization able to ignore the bitcast
instructions - yes, that work is unnecessary & perhaps questionable given
the typeless pointer work. Not outright off limits, but the same time might
be better invested in moving typeless pointers forward if the contributor
is so inclined/able to shift in that direction.

But if we're talking about the work to make the optimization not use the
type of pointers as a crutch - that work is a necessary precursor to the
typeless pointer work and would be most welcome.

- David

On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> I don't really mind, but the intermediate stage will not be very nice:
> that a lot of code / tests that needs to be written with bitcast, and all
> of that while they are deemed to disappear. The added value isn't clear
to
> me considering the added work. I'm not sure it wouldn't add more
work for
> all the cleanup required by the "typeless pointer", but I'm
not sure what's
> involved here and if David thinks the intermediate steps of handling bit
> casts everywhere is not making it harder I'm fine with it.
>
> --
> Mehdi
>
> On Mar 22, 2016, at 12:36 PM, Philip Reames <listmail at
philipreames.com>
> wrote:
>
> I'd phrase this differently: being pointer-bitcast agnostic is a step
> towards support typeless pointers.  :)  We can either become bitcast
> agnostic all in one big change or incrementally.  Personally, I'd
prefer
> the later since it reduces the risk associated with enabling typeless
> pointers in the end.
>
> Philip
>
> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>
> I don't know enough about the tradeoff for 1, but 2 seems like a
bandaid
> for something that is not a correctness issue neither a regression. I'm
not
> sure it justifies "bandaid patches" while there is a clear path
forward,
> i.e. typeless pointers, unless there is an acknowledgement that typeless
> pointers won't be there before a couple of years.
>
> --
> Mehdi
>
> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
>
> Back to the discussion on the RFC, I still see some advantage in following
> the proposed solution. I see two paths forward:
>
> 1- Change canonical form, possibly lower memcpy to non-integer load and
> store in InstCombine. Then teach the backends to convert that to integer
> load and store if that is more profitable. Notice that we are talking about
> loads that have no use other than store. So it is a fairly simple change in
> the backends.
>
> 2- Do not change the canonical form. Then for this bug, we need to teach
> select speculation to see through bitcasts. We will probably need to teach
> other optimizations to see though bitcasts in the future as problems are
> uncovered. That is until typeless pointer work is complete. Once the
> typeless pointer work is complete, we have some extra code in each
> optimization for seeing through bitcasts which is possibly no longer
needed.
>
> Based on this I think (1) is the right thing to do. But there might be
> other reasons for the current canonical form that I am not aware of. Please
> let me know what you think.
>
> Thanks
> Ehsan
>
> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie < <dblaikie at
gmail.com>
> dblaikie at gmail.com> wrote:
>
>>
>>
>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri < <ehsanamiri at
gmail.com>
>> ehsanamiri at gmail.com> wrote:
>>
>>> David,
>>>
>>> Could you give us an update on the status of typeless pointer work?
How
>>> much work is left and when you think it might be ready?
>>>
>>
>> It's a bit of an onion peel, really - since it will eventually
involve
>> generalizing/fixing every optimization that's currently leaning on
typed
>> pointers to keep the performance while removing the crutch they're
>> currently leaning on. (in cases where bitcasts are literally just
getting
>> in the way, those won't require cleaning up & should just
become "free
>> performance wins" once we remove them, though)
>>
>> At the moment we can roundtrip every LLVM IR test case through bitcode
>> and textual IR (reading/writing both formats) while using only a narrow
>> whitelist of places that request the type of a pointer (things like the
>> verifier, the parser/printer where it actually needs the typed pointer
to
>> verify it matches the explicit type, etc).
>>
>> The next thing on the list is probably figuring out the byval/inalloca
>> representation (add an explicit pointee type? just make the number of
bytes
>> explicit with no type information?).
>>
>> Then start migrating optimizations over - doing the same sort of
testing
>> I did for the IR/bitcode roundtripping - assert that the pointee type
is
>> not accessed, whitelist places that need it until the bitcasts go away,
fix
>> anything else... it'll still be a fair bit of work & I
don't really know
>> how much. It should parallelize pretty well (doing any of this work is
>> really helpful, each optimization is indepednent, etc) if anyone wants
>> to/is able to help.
>>
>> - Dave
>>
>>
>>>
>>> Thanks
>>> Ehsan
>>>
>>>
>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie < <dblaikie at
gmail.com>
>>> dblaikie at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via llvm-dev <
>>>> <llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> How do it interact with the "typeless pointers"
work?
>>>>>
>>>>
>>>> Right - the goal of the typeless pointer work is to fix all
these bugs
>>>> related to "didn't look through bitcasts" in
optimizations. Sometimes
>>>> that's going to mean more work (because the code is leaning
on the absence
>>>> of bitcasts & the presence of convenient (but not
guaranteed) type
>>>> information to inform optimization decisions) but if we remove
typed
>>>> pointers while keeping optimization quality in the cases we
have today,
>>>> then we should've also fixed the cases that were broken
because the type
>>>> information didn't end up aligning to produce the optimal
output.
>>>>
>>>> & I know I've been off the typeless pointer stuff for a
few months
>>>> working on llvm-dwp - but happy for any help (the next
immediate piece is
>>>> probably figuring out teh right representation for byval and
inalloca -
>>>> there were some code reviews sent out for that that I'll
need to come back
>>>> around to - but also any optimizations people want to help
rework/improve
>>>> would be great too & I can provide some techniques/tools to
help people
>>>> approach those)
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via llvm-dev <
>>>>> <llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org> wrote:
>>>>>
>>>>> === PROBLEM === (See this bug
>>>>> <https://llvm.org/bugs/show_bug.cgi?id=26445>
>>>>> https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>>
>>>>> IR contains code for loading a float from float * and
storing it to a
>>>>> float * address. After canonicalization of load in
InstCombine [1], new
>>>>> bitcasts are added to the IR (see bottom of the email for
code samples).
>>>>> This prevents select speculation in SROA to work. Also
after SROA we have
>>>>> bitcasts from int32 to float. (Whereas originally after
instCombine,
>>>>> bitcasts are only done on pointer types).
>>>>>
>>>>> === PROPOSED SOLUTION==>>>>>
>>>>> [1] implies that we need load canonicalization when we load
a value
>>>>> only to store it again. The reason is to avoid generating
slightly
>>>>> different code (due to different ways of adding bitcasts),
in different
>>>>> situations. In all examples presented in [1] there is a
non-zero number of
>>>>> bitcasts. I think when we load a value of type T from a T*
address and
>>>>> store it as a type T value to one or more T* address (and
there is no other
>>>>> use or store), we can redefine canonical form to mean there
should not be
>>>>> any bitcasts. So we still have a canonical form, but its
definition is
>>>>> slightly different.
>>>>>
>>>>> === REASONS FOR / AGAINST==>>>>>
>>>>> Hal Finkel warns that while this may be useful for power
pc, this may
>>>>> hurt more than one other platform and become a very large
project. Despite
>>>>> this he is fine with bringing up the issue to the mailing
list to get
>>>>> feedback, mostly because this seems inline with our future
direction of
>>>>> having a unique type for all pointers.  (Hal please correct
me if I
>>>>> misunderstood your comment)
>>>>>
>>>>> This is a much simpler fix compared to alternatives.
(ignoring
>>>>> potential regressions)
>>>>>
>>>>> === ALTERNATIVE SOLUTION ==>>>>>
>>>>> Fix select speculation in SROA to see through bitcasts.
Handle
>>>>> remaining bitcasts during code gen. Other alternative
solutions are welcome.
>>>>>
>>>>> Should I implement the proposed solution or is it too
risky? I
>>>>> understand that we may need to undo it if it breaks too
many things.
>>>>> Comments are welcome.
>>>>>
>>>>>
>>>>> [1]
>>>>>
<http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html>
>>>>>
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>>> r226781  git commit id: b778cbc0c8
>>>>>
>>>>>
>>>>>
>>>>> Code Samples (only relevant part is copied):
>>>>>
>>>>> --------------------  Before Canonicalization (contains
call to
>>>>> std::max): --------------------
>>>>> entry:
>>>>>   %max_value = alloca float, align 4
>>>>>   %1 = load float, float* %input, align 4, !tbaa !1
>>>>>   store float %1, float* %max_value, align 4, !tbaa !1
>>>>>
>>>>> for.body:
>>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>>>> dereferenceable(4) %max_value, float* dereferenceable(4)
%arrayidx1)
>>>>>   %3 = load float, float* %call, align 4, !tbaa !1
>>>>>   store float %3, float* %max_value, align 4, !tbaa !1
>>>>>
>>>>> --------------------  After Canonicalization (contains call
to
>>>>> std::max):--------------------
>>>>>
>>>>> entry:
>>>>>   %max_value = alloca float, align 4
>>>>>   %1 = bitcast float* %input to i32*
>>>>>   %2 = load i32, i32* %1, align 4, !tbaa !1
>>>>>   %3 = bitcast float* %max_value to i32*
>>>>>   store i32 %2, i32* %3, align 4, !tbaa !1
>>>>>
>>>>> for.body:
>>>>>   %call = call dereferenceable(4) float*
@_ZSt3maxIfERKT_S2_S2_(float*
>>>>> nonnull dereferenceable(4) %max_value, float*
dereferenceable(4) %arrayidx1)
>>>>>   %5 = bitcast float* %call to i32*
>>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>   %7 = bitcast float* %max_value to i32*
>>>>>   store i32 %6, i32* %7, align 4, !tbaa !1
>>>>>
>>>>> -------------------- After SROA (the call to std::max is
inlined
>>>>> now):--------------------
>>>>> entry:
>>>>>   %max_value.sroa.0 = alloca i32
>>>>>   %0 = bitcast float* %input to i32*
>>>>>   %1 = load i32, i32* %0, align 4, !tbaa !1
>>>>>   store i32 %1, i32* %max_value.sroa.0
>>>>>
>>>>> for.body:
>>>>>   %max_value.sroa.0.0.max_value.sroa.0.0.6 = load i32, i32*
>>>>> %max_value.sroa.0
>>>>>   %3 = bitcast i32 %max_value.sroa.0.0.max_value.sroa.0.0.6
to float
>>>>>   %max_value.sroa.0.0.max_value.sroa_cast8 = bitcast i32*
>>>>> %max_value.sroa.0 to float*
>>>>>   %__b.__a.i = select i1 %cmp.i, float* %arrayidx1, float*
>>>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>>>   %5 = bitcast float* %__b.__a.i to i32*
>>>>>   %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>   store i32 %6, i32* %max_value.sroa.0
>>>>>
>>>>> -------------------- After SROA when Canonicalization is
turned
>>>>> off--------------------
>>>>> entry:
>>>>>   %0 = load float, float* %input, align 4, !tbaa !1
>>>>>
>>>>> for.cond:                                         ; preds =
%for.body,
>>>>> %entry
>>>>>   %max_value.0 = phi float [ %0, %entry ], [
%.sroa.speculated,
>>>>> %for.body ]
>>>>>
>>>>> for.body:
>>>>>   %1 = load float, float* %arrayidx1, align 4, !tbaa !1
>>>>>   %cmp.i = fcmp olt float %max_value.0, %1
>>>>>   %.sroa.speculate.load.true = load float, float*
%arrayidx1, align 4,
>>>>> !tbaa !1
>>>>>   %.sroa.speculated = select i1 %cmp.i, float
>>>>> %.sroa.speculate.load.true, float %max_value.0
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> <llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org
>>>>>
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> <llvm-dev at lists.llvm.org>llvm-dev at
lists.llvm.org
>>>>>
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/5bec233d/attachment.html>

Philip Reames via llvm-dev

2016-Mar-22 20:31 UTC

head link

[llvm-dev] RFC: A change in InstCombine canonical form

I feel very strongly that blocking work on making optimization 
bitcast-ignorant on the typeless pointer work would be a major mistake.  
Unless we expected the typeless pointer work to be concluded within the 
near term (say 3-6 months maximum), we should not block any development 
which would be accepted in the typeless pointer work wasn't planned.

In my view, this is one of the largest mistakes we've made with the pass 
manager work, it has seriously cost us, and I don't want to see it 
happening again.

Philip

On 03/22/2016 01:09 PM, David Blaikie wrote:> Ultimately everything is going to be made to not rely on the types of 
> pointers - that's nearly equivalent to bitcast-ignorant (the 
> difference being that the presence of an extra instruction (the 
> bitcast) might trip up some optimizations - but the presence of the 
> /type/ information implied by the bitcast should not trip up or be 
> necessary for optimizations (two sides of the same coin))
>
> If we're talking about making an optimization able to ignore the 
> bitcast instructions - yes, that work is unnecessary & perhaps 
> questionable given the typeless pointer work. Not outright off limits, 
> but the same time might be better invested in moving typeless pointers 
> forward if the contributor is so inclined/able to shift in that direction.
>
> But if we're talking about the work to make the optimization not use 
> the type of pointers as a crutch - that work is a necessary precursor 
> to the typeless pointer work and would be most welcome.
>
> - David
>
> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com 
> <mailto:mehdi.amini at apple.com>> wrote:
>
>     I don't really mind, but the intermediate stage will not be very
>     nice: that a lot of code / tests that needs to be written with
>     bitcast, and all of that while they are deemed to disappear. The
>     added value isn't clear to me considering the added work. I'm
not
>     sure it wouldn't add more work for all the cleanup required by the
>     "typeless pointer", but I'm not sure what's involved
here and if
>     David thinks the intermediate steps of handling bit casts
>     everywhere is not making it harder I'm fine with it.
>
>     -- 
>     Mehdi
>
>>     On Mar 22, 2016, at 12:36 PM, Philip Reames
>>     <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>>
>>     I'd phrase this differently: being pointer-bitcast agnostic is
a
>>     step towards support typeless pointers.  :)  We can either become
>>     bitcast agnostic all in one big change or incrementally. 
>>     Personally, I'd prefer the later since it reduces the risk
>>     associated with enabling typeless pointers in the end.
>>
>>     Philip
>>
>>     On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>>>     I don't know enough about the tradeoff for 1, but 2 seems
like a
>>>     bandaid for something that is not a correctness issue neither a
>>>     regression. I'm not sure it justifies "bandaid
patches" while
>>>     there is a clear path forward, i.e. typeless pointers, unless
>>>     there is an acknowledgement that typeless pointers won't be
>>>     there before a couple of years.
>>>
>>>     -- 
>>>     Mehdi
>>>
>>>>     On Mar 22, 2016, at 11:32 AM, Ehsan Amiri <ehsanamiri at
gmail.com
>>>>     <mailto:ehsanamiri at gmail.com>> wrote:
>>>>
>>>>     Back to the discussion on the RFC, I still see some
advantage
>>>>     in following the proposed solution. I see two paths
forward:
>>>>
>>>>     1- Change canonical form, possibly lower memcpy to
non-integer
>>>>     load and store in InstCombine. Then teach the backends to
>>>>     convert that to integer load and store if that is more
>>>>     profitable. Notice that we are talking about loads that
have no
>>>>     use other than store. So it is a fairly simple change in
the
>>>>     backends.
>>>>
>>>>     2- Do not change the canonical form. Then for this bug, we
need
>>>>     to teach select speculation to see through bitcasts. We
will
>>>>     probably need to teach other optimizations to see though
>>>>     bitcasts in the future as problems are uncovered. That is
until
>>>>     typeless pointer work is complete. Once the typeless
pointer
>>>>     work is complete, we have some extra code in each
optimization
>>>>     for seeing through bitcasts which is possibly no longer
needed.
>>>>
>>>>     Based on this I think (1) is the right thing to do. But
there
>>>>     might be other reasons for the current canonical form that
I am
>>>>     not aware of. Please let me know what you think.
>>>>
>>>>     Thanks
>>>>     Ehsan
>>>>
>>>>     On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie
>>>>     <dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>>         On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri
>>>>         <ehsanamiri at gmail.com <mailto:ehsanamiri at
gmail.com>> wrote:
>>>>
>>>>             David,
>>>>
>>>>             Could you give us an update on the status of
typeless
>>>>             pointer work? How much work is left and when you
think
>>>>             it might be ready?
>>>>
>>>>
>>>>         It's a bit of an onion peel, really - since it will
>>>>         eventually involve generalizing/fixing every
optimization
>>>>         that's currently leaning on typed pointers to keep
the
>>>>         performance while removing the crutch they're
currently
>>>>         leaning on. (in cases where bitcasts are literally just
>>>>         getting in the way, those won't require cleaning up
&
>>>>         should just become "free performance wins"
once we remove
>>>>         them, though)
>>>>
>>>>         At the moment we can roundtrip every LLVM IR test case
>>>>         through bitcode and textual IR (reading/writing both
>>>>         formats) while using only a narrow whitelist of places
that
>>>>         request the type of a pointer (things like the
verifier,
>>>>         the parser/printer where it actually needs the typed
>>>>         pointer to verify it matches the explicit type, etc).
>>>>
>>>>         The next thing on the list is probably figuring out the
>>>>         byval/inalloca representation (add an explicit pointee
>>>>         type? just make the number of bytes explicit with no
type
>>>>         information?).
>>>>
>>>>         Then start migrating optimizations over - doing the
same
>>>>         sort of testing I did for the IR/bitcode roundtripping
-
>>>>         assert that the pointee type is not accessed, whitelist
>>>>         places that need it until the bitcasts go away, fix
>>>>         anything else... it'll still be a fair bit of work
& I
>>>>         don't really know how much. It should parallelize
pretty
>>>>         well (doing any of this work is really helpful, each
>>>>         optimization is indepednent, etc) if anyone wants to/is
>>>>         able to help.
>>>>
>>>>         - Dave
>>>>
>>>>
>>>>             Thanks
>>>>             Ehsan
>>>>
>>>>
>>>>             On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie
>>>>             <dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>>>
>>>>
>>>>
>>>>                 On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini
via
>>>>                 llvm-dev <llvm-dev at lists.llvm.org
>>>>                 <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>
>>>>                     Hi,
>>>>
>>>>                     How do it interact with the "typeless
pointers"
>>>>                     work?
>>>>
>>>>
>>>>                 Right - the goal of the typeless pointer work
is to
>>>>                 fix all these bugs related to "didn't
look through
>>>>                 bitcasts" in optimizations. Sometimes
that's going
>>>>                 to mean more work (because the code is leaning
on
>>>>                 the absence of bitcasts & the presence of
>>>>                 convenient (but not guaranteed) type
information to
>>>>                 inform optimization decisions) but if we remove
>>>>                 typed pointers while keeping optimization
quality
>>>>                 in the cases we have today, then we
should've also
>>>>                 fixed the cases that were broken because the
type
>>>>                 information didn't end up aligning to
produce the
>>>>                 optimal output.
>>>>
>>>>                 & I know I've been off the typeless
pointer stuff
>>>>                 for a few months working on llvm-dwp - but
happy
>>>>                 for any help (the next immediate piece is
probably
>>>>                 figuring out teh right representation for byval
and
>>>>                 inalloca - there were some code reviews sent
out
>>>>                 for that that I'll need to come back around
to -
>>>>                 but also any optimizations people want to help
>>>>                 rework/improve would be great too & I can
provide
>>>>                 some techniques/tools to help people approach
those)
>>>>
>>>>                 - Dave
>>>>
>>>>
>>>>                     Thanks,
>>>>
>>>>                     -- 
>>>>                     Mehdi
>>>>
>>>>>                     On Mar 16, 2016, at 6:41 AM, Ehsan
Amiri via
>>>>>                     llvm-dev <llvm-dev at lists.llvm.org
>>>>>                     <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>>>
>>>>>                     === PROBLEM === (See this bug
>>>>>                    
https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>>
>>>>>                     IR contains code for loading a float
from
>>>>>                     float * and storing it to a float *
address.
>>>>>                     After canonicalization of load in
InstCombine
>>>>>                     [1], new bitcasts are added to the IR
(see
>>>>>                     bottom of the email for code samples).
This
>>>>>                     prevents select speculation in SROA to
work.
>>>>>                     Also after SROA we have bitcasts from
int32 to
>>>>>                     float. (Whereas originally after
instCombine,
>>>>>                     bitcasts are only done on pointer
types).
>>>>>
>>>>>                     === PROPOSED
SOLUTION==>>>>>
>>>>>                     [1] implies that we need load
canonicalization
>>>>>                     when we load a value only to store it
again.
>>>>>                     The reason is to avoid generating
slightly
>>>>>                     different code (due to different ways
of
>>>>>                     adding bitcasts), in different
situations. In
>>>>>                     all examples presented in [1] there is
a
>>>>>                     non-zero number of bitcasts. I think
when we
>>>>>                     load a value of type T from a T*
address and
>>>>>                     store it as a type T value to one or
more T*
>>>>>                     address (and there is no other use or
store),
>>>>>                     we can redefine canonical form to mean
there
>>>>>                     should not be any bitcasts. So we still
have a
>>>>>                     canonical form, but its definition is
slightly
>>>>>                     different.
>>>>>
>>>>>                     === REASONS FOR /
AGAINST==>>>>>
>>>>>                     Hal Finkel warns that while this may be
useful
>>>>>                     for power pc, this may hurt more than
one
>>>>>                     other platform and become a very large
>>>>>                     project. Despite this he is fine with
bringing
>>>>>                     up the issue to the mailing list to get
>>>>>                     feedback, mostly because this seems
inline
>>>>>                     with our future direction of having a
unique
>>>>>                     type for all pointers. (Hal please
correct me
>>>>>                     if I misunderstood your comment)
>>>>>
>>>>>                     This is a much simpler fix compared to
>>>>>                     alternatives. (ignoring potential
regressions)
>>>>>
>>>>>                     === ALTERNATIVE SOLUTION
==>>>>>
>>>>>                     Fix select speculation in SROA to see
through
>>>>>                     bitcasts. Handle remaining bitcasts
during
>>>>>                     code gen. Other alternative solutions
are welcome.
>>>>>
>>>>>                     Should I implement the proposed
solution or is
>>>>>                     it too risky? I understand that we may
need to
>>>>>                     undo it if it breaks too many things.
Comments
>>>>>                     are welcome.
>>>>>
>>>>>
>>>>>                     [1]
>>>>>                    
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>>>                     r226781  git commit id: b778cbc0c8
>>>>>
>>>>>
>>>>>
>>>>>                     Code Samples (only relevant part is
copied):
>>>>>
>>>>>                     -------------------- Before
Canonicalization
>>>>>                     (contains call to std::max):
--------------------
>>>>>                     entry:
>>>>>                       %max_value = alloca float, align 4
>>>>>                       %1 = load float, float* %input, align
4,
>>>>>                     !tbaa !1
>>>>>                       store float %1, float* %max_value,
align 4,
>>>>>                     !tbaa !1
>>>>>
>>>>>                     for.body:
>>>>>                       %call = call dereferenceable(4)
float*
>>>>>                     @_ZSt3maxIfERKT_S2_S2_(float*
>>>>>                     dereferenceable(4) %max_value, float*
>>>>>                     dereferenceable(4) %arrayidx1)
>>>>>                       %3 = load float, float* %call, align
4, !tbaa !1
>>>>>                       store float %3, float* %max_value,
align 4,
>>>>>                     !tbaa !1
>>>>>
>>>>>                     -------------------- After
Canonicalization
>>>>>                     (contains call to
std::max):--------------------
>>>>>
>>>>>                     entry:
>>>>>                       %max_value = alloca float, align 4
>>>>>                       %1 = bitcast float* %input to i32*
>>>>>                       %2 = load i32, i32* %1, align 4,
!tbaa !1
>>>>>                       %3 = bitcast float* %max_value to
i32*
>>>>>                       store i32 %2, i32* %3, align 4, !tbaa
!1
>>>>>
>>>>>                     for.body:
>>>>>                       %call = call dereferenceable(4)
float*
>>>>>                     @_ZSt3maxIfERKT_S2_S2_(float* nonnull
>>>>>                     dereferenceable(4) %max_value, float*
>>>>>                     dereferenceable(4) %arrayidx1)
>>>>>                       %5 = bitcast float* %call to i32*
>>>>>                       %6 = load i32, i32* %5, align 4,
!tbaa !1
>>>>>                       %7 = bitcast float* %max_value to
i32*
>>>>>                       store i32 %6, i32* %7, align 4, !tbaa
!1
>>>>>
>>>>>                     -------------------- After SROA (the
call to
>>>>>                     std::max is inlined
now):--------------------
>>>>>                     entry:
>>>>>                     %max_value.sroa.0 = alloca i32
>>>>>                       %0 = bitcast float* %input to i32*
>>>>>                       %1 = load i32, i32* %0, align 4,
!tbaa !1
>>>>>                       store i32 %1, i32* %max_value.sroa.0
>>>>>
>>>>>                     for.body:
>>>>>                    
%max_value.sroa.0.0.max_value.sroa.0.0.6 >>>>>                   
load i32, i32* %max_value.sroa.0
>>>>>                       %3 = bitcast i32
>>>>>                    
%max_value.sroa.0.0.max_value.sroa.0.0.6 to float
>>>>>                    
%max_value.sroa.0.0.max_value.sroa_cast8 >>>>>                   
bitcast i32* %max_value.sroa.0 to float*
>>>>>                       %__b.__a.i = select i1 %cmp.i, float*
>>>>>                     %arrayidx1, float*
>>>>>                    
%max_value.sroa.0.0.max_value.sroa_cast8
>>>>>                       %5 = bitcast float* %__b.__a.i to
i32*
>>>>>                       %6 = load i32, i32* %5, align 4,
!tbaa !1
>>>>>                       store i32 %6, i32* %max_value.sroa.0
>>>>>
>>>>>                     -------------------- After SROA when
>>>>>                     Canonicalization is turned
>>>>>                     off--------------------
>>>>>                     entry:
>>>>>                       %0 = load float, float* %input, align
4,
>>>>>                     !tbaa !1
>>>>>
>>>>>                     for.cond: ; preds = %for.body, %entry
>>>>>                       %max_value.0 = phi float [ %0, %entry
], [
>>>>>                     %.sroa.speculated, %for.body ]
>>>>>
>>>>>                     for.body:
>>>>>                       %1 = load float, float* %arrayidx1,
align 4,
>>>>>                     !tbaa !1
>>>>>                       %cmp.i = fcmp olt float %max_value.0,
%1
>>>>>                     %.sroa.speculate.load.true = load
float,
>>>>>                     float* %arrayidx1, align 4, !tbaa !1
>>>>>                     %.sroa.speculated = select i1 %cmp.i,
float
>>>>>                     %.sroa.speculate.load.true, float
%max_value.0
>>>>>                    
_______________________________________________
>>>>>                     LLVM Developers mailing list
>>>>>                     llvm-dev at lists.llvm.org
>>>>>                     <mailto:llvm-dev at
lists.llvm.org>
>>>>>                    
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>                    
_______________________________________________
>>>>                     LLVM Developers mailing list
>>>>                     llvm-dev at lists.llvm.org
>>>>                     <mailto:llvm-dev at lists.llvm.org>
>>>>                    
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     LLVM Developers mailing list
>>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/a61d8417/attachment-0001.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Mar 2016 - RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

[llvm-dev] RFC: A change in InstCombine canonical form

Maybe Matching Threads