thr3ads.net - llvm dev - [llvm-dev] [RFC] Aggreate load/store, proposed plan [Aug 2015]

If this information is useful, please help other people find it:
Share via:

deadal nix via llvm-dev

2015-Aug-20 00:02 UTC

[llvm-dev] [RFC] Aggreate load/store, proposed plan

It is pretty clear people need this. Let's get this moving.

I'll try to sum up the point that have been made and I'll try to address
them carefully.

1/ There is no good solution for large aggregates.
That is true. However, I don't think this is a reason to not address
smaller aggregates, as they appear to be needed. Realistically, the
proportion of aggregates that are very large is small, and there is no
expectation that such a thing would map nicely to the hardware anyway (the
hardware won't have enough registers to load it all anyway). I do think
this is reasonable to expect a reasonable handling of relatively small
aggregates like fat pointers while accepting that larges ones will be
inefficient.

This limitation is not unique to the current discussion, as SROA suffer
from the same limitation.
It is possible to disable to transformation for aggregates that are too
large if this is too big of a concern. It should maybe also be done for
SROA.

2/ Slicing the aggregate break the semantic of atomic/volatile.
That is true. It means slicing the aggregate should not be done for
atomic/volatile. It doesn't mean this should not be done for regular ones
as it is reasonable to handle atomic/volatile differently. After all, they
have different semantic.

3/ Not slicing can create scalar that aren't supported by the target. This
is undesirable.
Indeed. But as always, the important question is compared to what ?

The hardware has no notion of aggregate, so an aggregate or a large scalar
ends up both requiring legalization. Doing the transformation is still
beneficial :
 - Some aggregates will generate valid scalars. For such aggregate, this is
100% win.
 - For aggregate that won't, the situation is still better as various
optimization passes will be able to handle the load in a sensible manner.
 - The transformation never make the situation worse than it is to begin
with.

On previous discussion, Hal Finkel seemed to think that the scalar solution
is preferable to the slicing one.

Is that a fair assessment of the situation ? Considering all of this, I
think the right path forward is :
 - Go for the scalar solution in the general case.
 - If that is a problem, the slicing approach can be used for non
atomic/volatile.
 - If necessary, disable the transformation for very large aggregates (and
consider doing so for SROA as well).

Do we have a plan ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/6ec41648/attachment.html>

Mehdi Amini via llvm-dev

2015-Aug-20 00:24 UTC

head link

[llvm-dev] [RFC] Aggreate load/store, proposed plan

Hi,

To be sure, because the RFC below is not detailed and assume everyone knows
about all the emails from 10 months ago, is there more to do than what is
proposed in http://reviews.llvm.org/D9766 ?

So basically the proposal is that *InstCombine* turns aggregate load/store into
a load/store using an integer of equivalent size and insert the correct bitcast
before/after, right?

Example is:

  %0 = tail call i8* @allocmemory(i64 32)
  %1 = bitcast i8* %0 to %B*
  store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8

into:

store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64) to i128),
i128 774763251095801167872), i128* %1, align 8

Where the aggregate is:

%B__vtbl = type { i8*, i32 (%B*)* }
@B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }


Thanks,

— 
Mehdi

> On Aug 19, 2015, at 5:02 PM, deadal nix via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> It is pretty clear people need this. Let's get this moving.
> 
> I'll try to sum up the point that have been made and I'll try to
address them carefully.
> 
> 1/ There is no good solution for large aggregates.
> That is true. However, I don't think this is a reason to not address
smaller aggregates, as they appear to be needed. Realistically, the proportion
of aggregates that are very large is small, and there is no expectation that
such a thing would map nicely to the hardware anyway (the hardware won't
have enough registers to load it all anyway). I do think this is reasonable to
expect a reasonable handling of relatively small aggregates like fat pointers
while accepting that larges ones will be inefficient.
> 
> This limitation is not unique to the current discussion, as SROA suffer
from the same limitation.
> It is possible to disable to transformation for aggregates that are too
large if this is too big of a concern. It should maybe also be done for SROA.
> 
> 2/ Slicing the aggregate break the semantic of atomic/volatile.
> That is true. It means slicing the aggregate should not be done for
atomic/volatile. It doesn't mean this should not be done for regular ones as
it is reasonable to handle atomic/volatile differently. After all, they have
different semantic.
> 
> 3/ Not slicing can create scalar that aren't supported by the target.
This is undesirable.
> Indeed. But as always, the important question is compared to what ?
> 
> The hardware has no notion of aggregate, so an aggregate or a large scalar
ends up both requiring legalization. Doing the transformation is still
beneficial :
>  - Some aggregates will generate valid scalars. For such aggregate, this is
100% win.
>  - For aggregate that won't, the situation is still better as various
optimization passes will be able to handle the load in a sensible manner.
>  - The transformation never make the situation worse than it is to begin
with.
> 
> On previous discussion, Hal Finkel seemed to think that the scalar solution
is preferable to the slicing one.
> 
> Is that a fair assessment of the situation ? Considering all of this, I
think the right path forward is :
>  - Go for the scalar solution in the general case.
>  - If that is a problem, the slicing approach can be used for non
atomic/volatile.
>  - If necessary, disable the transformation for very large aggregates (and
consider doing so for SROA as well).
> 
> Do we have a plan ?
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=KkqzAZMcLUlWa3Uwmbr4DQqJdYQAzN_pFY3M8dzVdZ8&s=SFb1jraizjgechN0Pq3738tzBZyK8dZRqIU8Zfi_Qns&e=

Hal Finkel via llvm-dev

2015-Aug-20 05:11 UTC

head link

[llvm-dev] [RFC] Aggreate load/store, proposed plan

----- Original Message -----> From: "Mehdi Amini via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "deadal nix" <deadalnix at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, August 19, 2015 7:24:28 PM
> Subject: Re: [llvm-dev] [RFC] Aggreate load/store, proposed plan
> 
> Hi,
> 
> To be sure, because the RFC below is not detailed and assume everyone
> knows about all the emails from 10 months ago,
I agree. The RFC needs to summarize the problems and the potential solutions.
> is there more to do
> than what is proposed in http://reviews.llvm.org/D9766 ?
> 
> So basically the proposal is that *InstCombine*
I think that fixing this early in the optimizer makes sense (InstCombine, etc.).
This seems little different from any other canonicalization problem. These
direct aggregate IR values are valid IR, but not our preferred canonical form,
so we should transform the IR, when possible, into our preferred canonical form.

 -Hal
> turns aggregate
> load/store into a load/store using an integer of equivalent size and
> insert the correct bitcast before/after, right?
> 
> Example is:
> 
>   %0 = tail call i8* @allocmemory(i64 32)
>   %1 = bitcast i8* %0 to %B*
>   store %B { %B__vtbl* @B__vtblZ, i32 42 }, %B* %1, align 8
> 
> into:
> 
> store i128 or (i128 zext (i64 ptrtoint (%B__vtbl* @B__vtblZ to i64)
> to i128), i128 774763251095801167872), i128* %1, align 8
> 
> Where the aggregate is:
> 
> %B__vtbl = type { i8*, i32 (%B*)* }
> @B__vtblZ = constant %B__vtbl { i8* null, i32 (%B*)* @B.foo }
> 
> 
> Thanks,
> 
> —
> Mehdi
> 
> 
> > On Aug 19, 2015, at 5:02 PM, deadal nix via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > 
> > It is pretty clear people need this. Let's get this moving.
> > 
> > I'll try to sum up the point that have been made and I'll try
to
> > address them carefully.
> > 
> > 1/ There is no good solution for large aggregates.
> > That is true. However, I don't think this is a reason to not
> > address smaller aggregates, as they appear to be needed.
> > Realistically, the proportion of aggregates that are very large is
> > small, and there is no expectation that such a thing would map
> > nicely to the hardware anyway (the hardware won't have enough
> > registers to load it all anyway). I do think this is reasonable to
> > expect a reasonable handling of relatively small aggregates like
> > fat pointers while accepting that larges ones will be inefficient.
> > 
> > This limitation is not unique to the current discussion, as SROA
> > suffer from the same limitation.
> > It is possible to disable to transformation for aggregates that are
> > too large if this is too big of a concern. It should maybe also be
> > done for SROA.
> > 
> > 2/ Slicing the aggregate break the semantic of atomic/volatile.
> > That is true. It means slicing the aggregate should not be done for
> > atomic/volatile. It doesn't mean this should not be done for
> > regular ones as it is reasonable to handle atomic/volatile
> > differently. After all, they have different semantic.
> > 
> > 3/ Not slicing can create scalar that aren't supported by the
> > target. This is undesirable.
> > Indeed. But as always, the important question is compared to what ?
> > 
> > The hardware has no notion of aggregate, so an aggregate or a large
> > scalar ends up both requiring legalization. Doing the
> > transformation is still beneficial :
> >  - Some aggregates will generate valid scalars. For such aggregate,
> >  this is 100% win.
> >  - For aggregate that won't, the situation is still better as
> >  various optimization passes will be able to handle the load in a
> >  sensible manner.
> >  - The transformation never make the situation worse than it is to
> >  begin with.
> > 
> > On previous discussion, Hal Finkel seemed to think that the scalar
> > solution is preferable to the slicing one.
> > 
> > Is that a fair assessment of the situation ? Considering all of
> > this, I think the right path forward is :
> >  - Go for the scalar solution in the general case.
> >  - If that is a problem, the slicing approach can be used for non
> >  atomic/volatile.
> >  - If necessary, disable the transformation for very large
> >  aggregates (and consider doing so for SROA as well).
> > 
> > Do we have a plan ?
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> >
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=v-ruWq0KCv2O3thJZiK6naxuXK8mQHZUmGq5FBtAmZ4&m=KkqzAZMcLUlWa3Uwmbr4DQqJdYQAzN_pFY3M8dzVdZ8&s=SFb1jraizjgechN0Pq3738tzBZyK8dZRqIU8Zfi_Qns&e>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Aug 2015 - [RFC] Aggreate load/store, proposed plan

[llvm-dev] [RFC] Aggreate load/store, proposed plan

[llvm-dev] [RFC] Aggreate load/store, proposed plan

[llvm-dev] [RFC] Aggreate load/store, proposed plan

Reasonably Related Threads