thr3ads.net - llvm dev - [llvm-dev] Aggregate load/stores [Aug 2015]

If this information is useful, please help other people find it:
Share via:

deadal nix via llvm-dev

2015-Aug-17 04:15 UTC

[llvm-dev] Aggregate load/stores

I understand these objections. They ends up being a problem at the limit
(ie the example of the 64k store or 1Mb+ aggregate). These probably require
their own fix or probably just shouldn't be supported.

That being said, there is a vast space between what is done now and
aggregate so big that it causes real hard problems like the ones you
mention. reducing that gap seems like a win to me. Additionally, there is
vast bag of trick that can be deployed to mitigate the problem (for
instance, load to store forwarding can be changed into memcpy). Things is,
one got to start somewhere.

More generally, it doesn't seems right to me to reject something because it
doesn't cover all bases. The fact that it covers more bases than what
exists now should be enough (unless it create some kind of bad precedent).

The argument that target are relying on InstCombine to mitigate IR
requiring legalization seems dubious to me. First, because both aggregate
and large scalar require legalization, so, if not ideal, the proposed
change does not makes things any worse than they already are. In fact, as
far as legalization is concerned, theses are pretty much the same. It
should also be noted that InstCombine is not guaranteed to run before the
target, so it seems like a bad idea to me to rely on it in the backend.

As for the big integral thing, I really don't care. I can change it to
create multiple loads/stores respecting data layout, I have the code for
that and could adapt it for this PR without too much trouble. If this is
the only thing that is blocking this PR, then we can proceed. But I'd like
some notion that we are making progress. Would you be willing to accept a
solution based on creating a serie of load/store respecting the datalayout ?


2015-08-16 20:01 GMT-07:00 David Majnemer <david.majnemer at gmail.com>:
>
>
> On Sun, Aug 16, 2015 at 7:05 PM, deadal nix via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> As most of you may now, LLVM is completely unable to do anything
>> reasonable with aggregate load/stores, beside just legalize them into
>> something in the backend.
>>
>> This is not a good state of affair. Aggregate are part of LLVM IR, and
as
>> such, LLVM should do something about them.
>>
>> That is a bit of a chicken and egg issue: front end just implement
their
>> own tricks to avoid aggregate or plain don't care about the
resulting
>> executable as long as it works. As such, pretty much everybody that
care
>> about this already implemented something in the front end.
>>
>> Which is honestly rather stupid. Everybody is doing the same work again
>> and again because LLVM is not doing it.
>>
>> That being said, I now know why LLVM is not doing it. Any attempt at
>> making things move on that front result in someone finding the solution
not
>> good enough and stalling the process.
>>
>> Things is, pretty much anything is better than nothing. Comparing any
>> current solution to an hypothetical nonexistant perfect solution is not
>> constructive. And at this stage, this is close to being disrespectful.
I
>> have http://reviews.llvm.org/D9766 (from may) and no actionable item on
>> it. It was done as per feedback on previous discussion on the subject.
>> There is no proposal to improve the code, no proposal to do it another
way,
>> no nothing. FROM MAY !
>>
>> I'd like to get things moving here. If you guys don't give a
s*** about
>> it because clang already have a work around, then fine. The good thing
is
>> that it won't affect clang, for the very same reason: it is not
using it.
>> But there are numerous front end out there, that do not have the
manpower
>> backing clang, and all have to jump through hoops to handle aggregate
for
>> LLVM not to mess up. So please be considerate for the smaller guys in
town.
>>
>
> Hello,
>
> I see things a little differently and I'll do my best to explain my
> position.
>
> First of all, LLVM's backend doesn't handle large aggregates very
well and
> there appears no desire from the greater community to fix this.  Last year
> I tried to fix PR21513 which involved a fairly large store of aggregate
> type totaling around 64 KB.  It turns out that the backend's
representation
> requires having a node with > 64,000 operands.  I submitted a patch to
fix
> this, http://reviews.llvm.org/D6164, but others in the community reasoned
> that the cure was worse than the disease as it results in all SDNodes
> becoming a little larger.
>
> Second of all, turning large aggregates memory operations into large
> scalar memory operations, via the integer types, doesn't work for
memory
> operations beyond 1 MB because the largest integer type is (2**23)-1 bits.
> I think it would be quite costly to make the scalarization scale
> significantly beyond this.  Beyond that, InstCombine is not supposed to
> generate type which aren't considered legal by datalayout.  Targets out
> there rely on InstCombine to respect this to mitigate the creation of IR
> which doesn't map well to the hardware.
>
> What I tried, but perhaps failed, to intimate was that today's status
quo
> is considered to be a reasonable engineering compromise.  LLVM doesn't
> provide a *completely* abstract and normalized interface to computation but
> we try our best in the face of the constraints we are faced with.  This is
> why clang's technique is, to me, reasonable.
>
> I hope this explains where I am coming from.
>
> Thanks,
> David Majnemer
>
>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150816/3ae6755b/attachment.html>

David Majnemer via llvm-dev

2015-Aug-17 05:10 UTC

head link

[llvm-dev] Aggregate load/stores

On Sun, Aug 16, 2015 at 9:15 PM, deadal nix <deadalnix at gmail.com>
wrote:
> I understand these objections. They ends up being a problem at the limit
> (ie the example of the 64k store or 1Mb+ aggregate). These probably require
> their own fix or probably just shouldn't be supported.
>
> That being said, there is a vast space between what is done now and
> aggregate so big that it causes real hard problems like the ones you
> mention. reducing that gap seems like a win to me. Additionally, there is
> vast bag of trick that can be deployed to mitigate the problem (for
> instance, load to store forwarding can be changed into memcpy). Things is,
> one got to start somewhere.
>
> More generally, it doesn't seems right to me to reject something
because
> it doesn't cover all bases. The fact that it covers more bases than
what
> exists now should be enough (unless it create some kind of bad precedent).
>
I would argue that a fix in the wrong direction is worse than the status
quo.

>
> The argument that target are relying on InstCombine to mitigate IR
> requiring legalization seems dubious to me. First, because both aggregate
> and large scalar require legalization, so, if not ideal, the proposed
> change does not makes things any worse than they already are. In fact, as
> far as legalization is concerned, theses are pretty much the same. It
> should also be noted that InstCombine is not guaranteed to run before the
> target, so it seems like a bad idea to me to rely on it in the backend.
>
InstCombine is not guaranteed to run before IR hits the backend but the
result of legalizing the machinations of InstCombine's output during
SelectionDAG is worse than generating illegal IR in the first place.

>
> As for the big integral thing, I really don't care. I can change it to
> create multiple loads/stores respecting data layout, I have the code for
> that and could adapt it for this PR without too much trouble. If this is
> the only thing that is blocking this PR, then we can proceed. But I'd
like
> some notion that we are making progress. Would you be willing to accept a
> solution based on creating a serie of load/store respecting the datalayout
?
>
Splitting the memory operation into smaller operations is not semantics
preserving from an IR-theoretic perspective.  For example, splitting a
volatile memory operation into several volatile memory operations is not
OK.  Same goes with atomics.  Some targets provide atomic memory operations
at the granularity of a cache line and splitting at legal integer
granularity would be observably different.

With the above in mind, I don't see it as unreasonable for frontends to
generate IR that LLVM is comfortable with.  We seem fine telling frontend
authors that they should strive to avoid large aggregate memory operations
in our performance tips guide <
http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
Implementation experience with Clang hasn't shown this to be particularly
odious to follow and none of the LLVM-side solutions seem satisfactory.

>
>
> 2015-08-16 20:01 GMT-07:00 David Majnemer <david.majnemer at
gmail.com>:
>
>>
>>
>> On Sun, Aug 16, 2015 at 7:05 PM, deadal nix via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi all,
>>>
>>> As most of you may now, LLVM is completely unable to do anything
>>> reasonable with aggregate load/stores, beside just legalize them
into
>>> something in the backend.
>>>
>>> This is not a good state of affair. Aggregate are part of LLVM IR,
and
>>> as such, LLVM should do something about them.
>>>
>>> That is a bit of a chicken and egg issue: front end just implement
their
>>> own tricks to avoid aggregate or plain don't care about the
resulting
>>> executable as long as it works. As such, pretty much everybody that
care
>>> about this already implemented something in the front end.
>>>
>>> Which is honestly rather stupid. Everybody is doing the same work
again
>>> and again because LLVM is not doing it.
>>>
>>> That being said, I now know why LLVM is not doing it. Any attempt
at
>>> making things move on that front result in someone finding the
solution not
>>> good enough and stalling the process.
>>>
>>> Things is, pretty much anything is better than nothing. Comparing
any
>>> current solution to an hypothetical nonexistant perfect solution is
not
>>> constructive. And at this stage, this is close to being
disrespectful. I
>>> have http://reviews.llvm.org/D9766 (from may) and no actionable
item on
>>> it. It was done as per feedback on previous discussion on the
subject.
>>> There is no proposal to improve the code, no proposal to do it
another way,
>>> no nothing. FROM MAY !
>>>
>>> I'd like to get things moving here. If you guys don't give
a s*** about
>>> it because clang already have a work around, then fine. The good
thing is
>>> that it won't affect clang, for the very same reason: it is not
using it.
>>> But there are numerous front end out there, that do not have the
manpower
>>> backing clang, and all have to jump through hoops to handle
aggregate for
>>> LLVM not to mess up. So please be considerate for the smaller guys
in town.
>>>
>>
>> Hello,
>>
>> I see things a little differently and I'll do my best to explain my
>> position.
>>
>> First of all, LLVM's backend doesn't handle large aggregates
very well
>> and there appears no desire from the greater community to fix this. 
Last
>> year I tried to fix PR21513 which involved a fairly large store of
>> aggregate type totaling around 64 KB.  It turns out that the
backend's
>> representation requires having a node with > 64,000 operands.  I
submitted
>> a patch to fix this, http://reviews.llvm.org/D6164, but others in the
>> community reasoned that the cure was worse than the disease as it
results
>> in all SDNodes becoming a little larger.
>>
>> Second of all, turning large aggregates memory operations into large
>> scalar memory operations, via the integer types, doesn't work for
memory
>> operations beyond 1 MB because the largest integer type is (2**23)-1
bits.
>> I think it would be quite costly to make the scalarization scale
>> significantly beyond this.  Beyond that, InstCombine is not supposed to
>> generate type which aren't considered legal by datalayout.  Targets
out
>> there rely on InstCombine to respect this to mitigate the creation of
IR
>> which doesn't map well to the hardware.
>>
>> What I tried, but perhaps failed, to intimate was that today's
status quo
>> is considered to be a reasonable engineering compromise.  LLVM
doesn't
>> provide a *completely* abstract and normalized interface to computation
but
>> we try our best in the face of the constraints we are faced with.  This
is
>> why clang's technique is, to me, reasonable.
>>
>> I hope this explains where I am coming from.
>>
>> Thanks,
>> David Majnemer
>>
>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org         http://llvm.cs.uiuc.edu
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150816/590ad446/attachment.html>

deadal nix via llvm-dev

2015-Aug-17 05:27 UTC

head link

[llvm-dev] Aggregate load/stores

2015-08-16 22:10 GMT-07:00 David Majnemer <david.majnemer at
gmail.com>:>
>
> I would argue that a fix in the wrong direction is worse than the status
> quo.
>
How is proposed change worse than status quo ?

>
>
>>
>> The argument that target are relying on InstCombine to mitigate IR
>> requiring legalization seems dubious to me. First, because both
aggregate
>> and large scalar require legalization, so, if not ideal, the proposed
>> change does not makes things any worse than they already are. In fact,
as
>> far as legalization is concerned, theses are pretty much the same. It
>> should also be noted that InstCombine is not guaranteed to run before
the
>> target, so it seems like a bad idea to me to rely on it in the backend.
>>
>
> InstCombine is not guaranteed to run before IR hits the backend but the
> result of legalizing the machinations of InstCombine's output during
> SelectionDAG is worse than generating illegal IR in the first place.
>
That does not follow. InstCombine is not creating new things that require
legalisation, it changes one thing that require legalization into another
that a larger part of LLVM can understand.

>
>
>>
>> As for the big integral thing, I really don't care. I can change it
to
>> create multiple loads/stores respecting data layout, I have the code
for
>> that and could adapt it for this PR without too much trouble. If this
is
>> the only thing that is blocking this PR, then we can proceed. But
I'd like
>> some notion that we are making progress. Would you be willing to accept
a
>> solution based on creating a serie of load/store respecting the
datalayout ?
>>
>
> Splitting the memory operation into smaller operations is not semantics
> preserving from an IR-theoretic perspective.  For example, splitting a
> volatile memory operation into several volatile memory operations is not
> OK.  Same goes with atomics.  Some targets provide atomic memory operations
> at the granularity of a cache line and splitting at legal integer
> granularity would be observably different.
>
>That is off topic. Proposed patch explicitly gate for this.

>
> With the above in mind, I don't see it as unreasonable for frontends to
> generate IR that LLVM is comfortable with.  We seem fine telling frontend
> authors that they should strive to avoid large aggregate memory operations
> in our performance tips guide <
>
http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
> Implementation experience with Clang hasn't shown this to be
particularly
> odious to follow and none of the LLVM-side solutions seem satisfactory.
>
>Most front end do not have clang resources. Additionally, this tip is not
quite accurate. I'm not interested in large aggregate load/store at this
stage. I'm interested in ANY aggregate load/store. LLVM is just unable to
handle any of it in a way that make sense. It could certainly do better for
small aggregate, without too much trouble.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150816/715b9089/attachment.html>

Philip Reames via llvm-dev

2015-Aug-17 17:05 UTC

head link

[llvm-dev] Aggregate load/stores

On 08/16/2015 10:10 PM, David Majnemer via llvm-dev
wrote:> With the above in mind, I don't see it as unreasonable for frontends 
> to generate IR that LLVM is comfortable with.  We seem fine telling 
> frontend authors that they should strive to avoid large aggregate 
> memory operations in our performance tips guide 
>
<http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
> Implementation experience with Clang hasn't shown this to be 
> particularly odious to follow and none of the LLVM-side solutions seem 
> satisfactory.David, speaking as the guy who wrote the documentation you're quoting, 
you're twisting the intent of that document.  The document was 
explicitly intended to document current status, warts and all. Please do 
not use it to justify not fixing those warts.  :)

In general, I feel that a solution which worked for FCAs under some 
fixed size (64k, 1MB, fine!) would be better than one that worked for 
none.  We could just document the limitation and call it a day. (I'll 
note that I am not endorsing or discouraging any *particular* solution 
to said problem.)

Philip

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2015 - Aggregate load/stores

[llvm-dev] Aggregate load/stores

[llvm-dev] Aggregate load/stores

[llvm-dev] Aggregate load/stores

[llvm-dev] Aggregate load/stores

Possibly Parallel Threads