thr3ads.net - llvm dev - [llvm-dev] Aggregate load/stores [Aug 2015]

If this information is useful, please help other people find it:
Share via:

deadal nix via llvm-dev

2015-Aug-17 05:27 UTC

[llvm-dev] Aggregate load/stores

2015-08-16 22:10 GMT-07:00 David Majnemer <david.majnemer at
gmail.com>:>
>
> I would argue that a fix in the wrong direction is worse than the status
> quo.
>
How is proposed change worse than status quo ?

>
>
>>
>> The argument that target are relying on InstCombine to mitigate IR
>> requiring legalization seems dubious to me. First, because both
aggregate
>> and large scalar require legalization, so, if not ideal, the proposed
>> change does not makes things any worse than they already are. In fact,
as
>> far as legalization is concerned, theses are pretty much the same. It
>> should also be noted that InstCombine is not guaranteed to run before
the
>> target, so it seems like a bad idea to me to rely on it in the backend.
>>
>
> InstCombine is not guaranteed to run before IR hits the backend but the
> result of legalizing the machinations of InstCombine's output during
> SelectionDAG is worse than generating illegal IR in the first place.
>
That does not follow. InstCombine is not creating new things that require
legalisation, it changes one thing that require legalization into another
that a larger part of LLVM can understand.

>
>
>>
>> As for the big integral thing, I really don't care. I can change it
to
>> create multiple loads/stores respecting data layout, I have the code
for
>> that and could adapt it for this PR without too much trouble. If this
is
>> the only thing that is blocking this PR, then we can proceed. But
I'd like
>> some notion that we are making progress. Would you be willing to accept
a
>> solution based on creating a serie of load/store respecting the
datalayout ?
>>
>
> Splitting the memory operation into smaller operations is not semantics
> preserving from an IR-theoretic perspective.  For example, splitting a
> volatile memory operation into several volatile memory operations is not
> OK.  Same goes with atomics.  Some targets provide atomic memory operations
> at the granularity of a cache line and splitting at legal integer
> granularity would be observably different.
>
>That is off topic. Proposed patch explicitly gate for this.

>
> With the above in mind, I don't see it as unreasonable for frontends to
> generate IR that LLVM is comfortable with.  We seem fine telling frontend
> authors that they should strive to avoid large aggregate memory operations
> in our performance tips guide <
>
http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
> Implementation experience with Clang hasn't shown this to be
particularly
> odious to follow and none of the LLVM-side solutions seem satisfactory.
>
>Most front end do not have clang resources. Additionally, this tip is not
quite accurate. I'm not interested in large aggregate load/store at this
stage. I'm interested in ANY aggregate load/store. LLVM is just unable to
handle any of it in a way that make sense. It could certainly do better for
small aggregate, without too much trouble.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150816/715b9089/attachment.html>

David Majnemer via llvm-dev

2015-Aug-17 06:21 UTC

head link

[llvm-dev] Aggregate load/stores

On Sun, Aug 16, 2015 at 10:27 PM, deadal nix <deadalnix at gmail.com>
wrote:
>
>
> 2015-08-16 22:10 GMT-07:00 David Majnemer <david.majnemer at
gmail.com>:
>>
>>
>> I would argue that a fix in the wrong direction is worse than the
status
>> quo.
>>
>
> How is proposed change worse than status quo ?
>
Because a solution which doesn't generalize is not a very powerful
solution.  What happens when somebody says that they want to use atomics +
large aggregate loads and stores? Give them yet another, different answer?
That would mean our earlier, less general answer, approach was either a
bandaid (bad) or the new answer requires a parallel code path in their
frontend (worse).

>
>
>>
>>
>>>
>>> The argument that target are relying on InstCombine to mitigate IR
>>> requiring legalization seems dubious to me. First, because both
aggregate
>>> and large scalar require legalization, so, if not ideal, the
proposed
>>> change does not makes things any worse than they already are. In
fact, as
>>> far as legalization is concerned, theses are pretty much the same.
It
>>> should also be noted that InstCombine is not guaranteed to run
before the
>>> target, so it seems like a bad idea to me to rely on it in the
backend.
>>>
>>
>> InstCombine is not guaranteed to run before IR hits the backend but the
>> result of legalizing the machinations of InstCombine's output
during
>> SelectionDAG is worse than generating illegal IR in the first place.
>>
>
> That does not follow. InstCombine is not creating new things that require
> legalisation, it changes one thing that require legalization into another
> that a larger part of LLVM can understand.
>
I'm afraid I don't understand what you are getting at here.  InstCombine
carefully avoids ptrtoint to weird types, truncs to weird types, etc. when
creating new IR.

>
>
>>
>>
>>>
>>> As for the big integral thing, I really don't care. I can
change it to
>>> create multiple loads/stores respecting data layout, I have the
code for
>>> that and could adapt it for this PR without too much trouble. If
this is
>>> the only thing that is blocking this PR, then we can proceed. But
I'd like
>>> some notion that we are making progress. Would you be willing to
accept a
>>> solution based on creating a serie of load/store respecting the
datalayout ?
>>>
>>
>> Splitting the memory operation into smaller operations is not semantics
>> preserving from an IR-theoretic perspective.  For example, splitting a
>> volatile memory operation into several volatile memory operations is
not
>> OK.  Same goes with atomics.  Some targets provide atomic memory
operations
>> at the granularity of a cache line and splitting at legal integer
>> granularity would be observably different.
>>
>>
> That is off topic. Proposed patch explicitly gate for this.
>
Then I guess we agree to disagree about what is "on topic".  I think
that
our advice to frontend authors regarding larger-than-legal loads/stores
should be uniform and not dependent on whether or not the operation was or
was not volatile.

>
>
>>
>> With the above in mind, I don't see it as unreasonable for
frontends to
>> generate IR that LLVM is comfortable with.  We seem fine telling
frontend
>> authors that they should strive to avoid large aggregate memory
operations
>> in our performance tips guide <
>>
http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
>> Implementation experience with Clang hasn't shown this to be
particularly
>> odious to follow and none of the LLVM-side solutions seem satisfactory.
>>
>>
> Most front end do not have clang resources. Additionally, this tip is not
> quite accurate. I'm not interested in large aggregate load/store at
this
> stage. I'm interested in ANY aggregate load/store. LLVM is just unable
to
> handle any of it in a way that make sense. It could certainly do better for
> small aggregate, without too much trouble.
>
>I'm confused what you mean about "clang resources" here, you
haven't made
it clear what the burden it is to your frontend. I'm not saying that there
isn't such a burden, I just haven't seen it been articulated and I have
heard nothing similar from other folks using LLVM.  What prevents you from
performing field-at-a-time loads and stores or calls to the memcpy
intrinsic?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150816/9468d4d3/attachment.html>

deadal nix via llvm-dev

2015-Aug-17 07:13 UTC

head link

[llvm-dev] Aggregate load/stores

2015-08-16 23:21 GMT-07:00 David Majnemer <david.majnemer at gmail.com>:
>
>
> Because a solution which doesn't generalize is not a very powerful
> solution.  What happens when somebody says that they want to use atomics +
> large aggregate loads and stores? Give them yet another, different answer?
> That would mean our earlier, less general answer, approach was either a
> bandaid (bad) or the new answer requires a parallel code path in their
> frontend (worse).
>
>
It is expected from atomics/volatile to work differently. That is the whole
point of them.

A lot of optimization in InstCombine plain ignore atomic/volatile
load/store. That is expected.

>
>>
>>>
>>>
>>>>
>>>> The argument that target are relying on InstCombine to mitigate
IR
>>>> requiring legalization seems dubious to me. First, because both
aggregate
>>>> and large scalar require legalization, so, if not ideal, the
proposed
>>>> change does not makes things any worse than they already are.
In fact, as
>>>> far as legalization is concerned, theses are pretty much the
same. It
>>>> should also be noted that InstCombine is not guaranteed to run
before the
>>>> target, so it seems like a bad idea to me to rely on it in the
backend.
>>>>
>>>
>>> InstCombine is not guaranteed to run before IR hits the backend but
the
>>> result of legalizing the machinations of InstCombine's output
during
>>> SelectionDAG is worse than generating illegal IR in the first
place.
>>>
>>
>> That does not follow. InstCombine is not creating new things that
require
>> legalisation, it changes one thing that require legalization into
another
>> that a larger part of LLVM can understand.
>>
>
> I'm afraid I don't understand what you are getting at here. 
InstCombine
> carefully avoids ptrtoint to weird types, truncs to weird types, etc. when
> creating new IR.
>
What I'm getting at is that is doesn't make the situation any worse. In
fact, it makes things actually better as you actually get something that do
not need legalization is some cases, when you always get something that
need legalization otherwize.

Think about it. aggregate require legalization. Large scalar require
legalization. Without the patch, you always need legalization. With the
patch you sometime need legalization. And get optimization back into the
game.

If your point is that less legalization is better, then the change is a
win. If you are willing to go the the multiple load/store solution for non
volatile/atomic, then even more so.

>
>
>>
>>
>>>
>>>
>>>>
>>>> As for the big integral thing, I really don't care. I can
change it to
>>>> create multiple loads/stores respecting data layout, I have the
code for
>>>> that and could adapt it for this PR without too much trouble.
If this is
>>>> the only thing that is blocking this PR, then we can proceed.
But I'd like
>>>> some notion that we are making progress. Would you be willing
to accept a
>>>> solution based on creating a serie of load/store respecting the
datalayout ?
>>>>
>>>
>>> Splitting the memory operation into smaller operations is not
semantics
>>> preserving from an IR-theoretic perspective.  For example,
splitting a
>>> volatile memory operation into several volatile memory operations
is not
>>> OK.  Same goes with atomics.  Some targets provide atomic memory
operations
>>> at the granularity of a cache line and splitting at legal integer
>>> granularity would be observably different.
>>>
>>>
>> That is off topic. Proposed patch explicitly gate for this.
>>
>
> Then I guess we agree to disagree about what is "on topic".  I
think that
> our advice to frontend authors regarding larger-than-legal loads/stores
> should be uniform and not dependent on whether or not the operation was or
> was not volatile.
>
>
Come on. It is expected from atomic/volatile to be handled differently.
That is the whole point of atomic/volatile. Bringing atomic/volatile as an
argument that something should not be done in the general case sounds like
backward rationalization.

>
>>
>>>
>>> With the above in mind, I don't see it as unreasonable for
frontends to
>>> generate IR that LLVM is comfortable with.  We seem fine telling
frontend
>>> authors that they should strive to avoid large aggregate memory
operations
>>> in our performance tips guide <
>>>
http://llvm.org/docs/Frontend/PerformanceTips.html#avoid-loads-and-stores-of-large-aggregate-type>.
>>> Implementation experience with Clang hasn't shown this to be
particularly
>>> odious to follow and none of the LLVM-side solutions seem
satisfactory.
>>>
>>>
>> Most front end do not have clang resources. Additionally, this tip is
not
>> quite accurate. I'm not interested in large aggregate load/store at
this
>> stage. I'm interested in ANY aggregate load/store. LLVM is just
unable to
>> handle any of it in a way that make sense. It could certainly do better
for
>> small aggregate, without too much trouble.
>>
>>
> I'm confused what you mean about "clang resources" here, you
haven't made
> it clear what the burden it is to your frontend. I'm not saying that
there
> isn't such a burden, I just haven't seen it been articulated and I
have
> heard nothing similar from other folks using LLVM.  What prevents you from
> performing field-at-a-time loads and stores or calls to the memcpy
> intrinsic?
>
clang has many developer behind it, some of them paid to work on it. That s
simply not the case for many others.

But to answer your questions :
 - Per field load/store generate more loads/stores than necessary in many
cases. These can't be aggregated back because of padding.
 - memcpy only work memory to memory. It is certainly usable in some cases,
but certainly do not cover all uses.

I'm willing to do the memcpy optimization in InstCombine (in fact, things
would not degenerate into so much bikescheding, that would already be done).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150817/4a727fcc/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Aug 2015 - Aggregate load/stores

[llvm-dev] Aggregate load/stores

[llvm-dev] Aggregate load/stores

[llvm-dev] Aggregate load/stores

Reasonably Related Threads