thr3ads.net - llvm dev - [LLVMdev] Loads moving across barriers [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Matt Arsenault

2013-Dec-21 02:46 UTC

[LLVMdev] Loads moving across barriers

On Dec 4, 2013, at 8:25 PM, Andrew Trick <atrick at apple.com> wrote:
> 
> On Dec 4, 2013, at 5:19 PM, Matt Arsenault <Matthew.Arsenault at
amd.com> wrote:
> 
>> On 12/04/2013 04:29 PM, Andrew Trick wrote:
>>> On Dec 4, 2013, at 3:33 PM, Matt Arsenault <Matthew.Arsenault at
amd.com> wrote:
>>> 
>>>> On 11/11/2013 03:13 PM, Andrew Trick wrote:
>>>>> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2 at
gmail.com> wrote:
>>>>> 
>>>>>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth
<chandlerc at google.com> wrote:
>>>>>> 
>>>>>>> Perhaps you're instead trying to say that with
certain address spaces "noalias" (and by inference,
"restrict" at the language level) has a different semantic model than
other address spaces? While it's less worrisome than the first
interpretation, I still don't really like it.
>>>>>>> 
>>>>>> This sounds right. With the constant address space,
anything you do is OK since it’s constant. Private address space is supposed to
be totally inaccessible from other workitems, so parallel modifications aren’t a
concern. The others require explicit synchronization which noalias would need to
be aware of.
>>>>> FWIW, it seems generally useful to me to have a nomemfence
function attribute and intrinsic property. We should avoid memory optimization
(and possibly other optimization) across these regardless of alias analysis.
>>>>> 
>>>> I'm think I'll try implementing this. Ideally it would
be parameterized over the address space, so it makes more sense for it to be a
memfence attribute rather than a nomemfence. You would then have an arbitrary
number of memfence(N) attributes for each required address space.
>>> So for correctness, would we need to tag all functions with
memfence(0..M) until we can prove otherwise? That seem heinous.
>> I was thinking the absence of it would mean no memfence in any address
space, which is the current behavior. This adds the option of fencing.
>>> Better to have an optional attribute that can be added to expose
optimization. Is it important in practice to optimize the case of memfence(I) +
nomemfence(J)?
>> I think it would be important for the GPU case. You never need a
memfence for private address space / addrspace 0, but you frequently want them
for local or global. The local or global writes can't be reordered, but it
could be very beneficial to move the private accesses across fences which might
help reduce register usage.
>> 
>>>  If so, is there a problem with nomemfence(N)?
>> nomemfence is the current assumption made on an arbitrary call, and
it's the common case. Specifying the absence of a fence seems backwards of
how this is used and more cumbersome to deal with. To match the current
behavior, it would require littering nomemfence for any possible address space
everywhere. In OpenCL you specify your fences, so it would be more
straightforward to map that. If I have a memfence intrinsic, I just need to mark
it with the fence attribute, and then propogate it to its callers. There would
generally only be a few of them in any program compared to fenceless calls. To
implement this with nomemfence, I would have to mark every function with at
least 4 nomemfences, and remove them when encountering the memfence intrinsic.
> 
> 
> Sure, but the program still needs to be correct if you skip attribute
propagation.
> -Andy
Is this a requirement for an attribute? This would be a problem for the already
existing noduplicate. If a function has a call to a noduplicate function, the
calling function could still be duplicated if the attribute isn’t propagated
which isn’t allowed.

- Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131220/16fda8d4/attachment.html>

Andrew Trick

2013-Dec-27 01:54 UTC

head link

[LLVMdev] Loads moving across barriers

On Dec 20, 2013, at 6:46 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:
> 
> On Dec 4, 2013, at 8:25 PM, Andrew Trick <atrick at apple.com> wrote:
> 
>> 
>> On Dec 4, 2013, at 5:19 PM, Matt Arsenault <Matthew.Arsenault at
amd.com> wrote:
>> 
>>> On 12/04/2013 04:29 PM, Andrew Trick wrote:
>>>> On Dec 4, 2013, at 3:33 PM, Matt Arsenault
<Matthew.Arsenault at amd.com> wrote:
>>>> 
>>>>> On 11/11/2013 03:13 PM, Andrew Trick wrote:
>>>>>> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2
at gmail.com> wrote:
>>>>>> 
>>>>>>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth
<chandlerc at google.com> wrote:
>>>>>>> 
>>>>>>>> Perhaps you're instead trying to say that
with certain address spaces "noalias" (and by inference,
"restrict" at the language level) has a different semantic model than
other address spaces? While it's less worrisome than the first
interpretation, I still don't really like it.
>>>>>>>> 
>>>>>>> This sounds right. With the constant address space,
anything you do is OK since it’s constant. Private address space is supposed to
be totally inaccessible from other workitems, so parallel modifications aren’t a
concern. The others require explicit synchronization which noalias would need to
be aware of.
>>>>>> FWIW, it seems generally useful to me to have a
nomemfence function attribute and intrinsic property. We should avoid memory
optimization (and possibly other optimization) across these regardless of alias
analysis.
>>>>>> 
>>>>> I'm think I'll try implementing this. Ideally it
would be parameterized over the address space, so it makes more sense for it to
be a memfence attribute rather than a nomemfence. You would then have an
arbitrary number of memfence(N) attributes for each required address space.
>>>> So for correctness, would we need to tag all functions with
memfence(0..M) until we can prove otherwise? That seem heinous.
>>> I was thinking the absence of it would mean no memfence in any
address space, which is the current behavior. This adds the option of fencing.
>>>> Better to have an optional attribute that can be added to
expose optimization. Is it important in practice to optimize the case of
memfence(I) + nomemfence(J)?
>>> I think it would be important for the GPU case. You never need a
memfence for private address space / addrspace 0, but you frequently want them
for local or global. The local or global writes can't be reordered, but it
could be very beneficial to move the private accesses across fences which might
help reduce register usage.
>>> 
>>>>  If so, is there a problem with nomemfence(N)?
>>> nomemfence is the current assumption made on an arbitrary call, and
it's the common case. Specifying the absence of a fence seems backwards of
how this is used and more cumbersome to deal with. To match the current
behavior, it would require littering nomemfence for any possible address space
everywhere. In OpenCL you specify your fences, so it would be more
straightforward to map that. If I have a memfence intrinsic, I just need to mark
it with the fence attribute, and then propogate it to its callers. There would
generally only be a few of them in any program compared to fenceless calls. To
implement this with nomemfence, I would have to mark every function with at
least 4 nomemfences, and remove them when encountering the memfence intrinsic.
>> 
>> 
>> Sure, but the program still needs to be correct if you skip attribute
propagation.
>> -Andy
> 
> Is this a requirement for an attribute? This would be a problem for the
already existing noduplicate. If a function has a call to a noduplicate
function, the calling function could still be duplicated if the attribute isn’t
propagated which isn’t allowed.
Others can weigh in here. This is just my understanding. Attribute propagation
has to be optional because we can’t assume inter-procedural optimization runs
for correct codegen. What if the memfence resides in a different module?

In the case of noduplicate, the only reason to propagate AFAICT would be to
suppress inlining. It seems reasonable enough to expect attribute propagation to
happen before inlining. So I don't think noduplicate is an issue in
practice.

I think "memfence" could be an issue if we use the attribute to
summarize LLVM atomic load/store and fence instructions (in addition to OpenCL
barriers).

If the semantics you are proposing won't apply to general memory ordering
constraints, then at least the name should be changed to specifically refer to
OpenCL barriers.

-Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131226/82b5c0f1/attachment.html>

Chandler Carruth

2013-Dec-27 02:25 UTC

head link

[LLVMdev] Loads moving across barriers

On Thu, Dec 26, 2013 at 8:54 PM, Andrew Trick <atrick at apple.com> wrote:
> Others can weigh in here. This is just my understanding. Attribute
> propagation has to be optional because we can’t assume inter-procedural
> optimization runs for correct codegen. What if the memfence resides in a
> different module?
>
> In the case of noduplicate, the only reason to propagate AFAICT would be
> to suppress inlining. It seems reasonable enough to expect attribute
> propagation to happen before inlining. So I don't think noduplicate is
an
> issue in practice.
>
I think you've misunderstood the specification of noduplicate... This
isn't
how it works.

Let's assume we have functions A, B, C, and D. Function A is marked as
'noduplicate' and thus all calls to it are marked 'noduplicate'.
Functions
B and C call function A in exactly one place. Function D calls function B
twice, and C in exactly one. place. Functions B and C are internal.
Functions B, C, and D are defined, while function A is only declared.

Only function A and calls to function A are marked as 'noduplicate'. I
don't see any reason why this attribute would be propagated?

Function B cannot be inlined into function D because doing so would
duplicate one of the calls to A. The inliner checks this *while doing the
inlining*, it does not rely on any function attribute on B for correctness
here.

Function C *can* be inlined into function D because there is only one call
site and it is an internal function. Thus, the call to A is not duplicated,
it is merely sunk into D.

So there is no propagation of attributes to achieve correctness even with
noduplicate. The inliner directly checks[1] the callee's call instructions
to ensure that inlining is valid.

I agree with Andy that we should *not* add a requirement to propagate such
attributes.

> I think "memfence" could be an issue if we use the attribute to
summarize
> LLVM atomic load/store and fence instructions (in addition to OpenCL
> barriers).
>
I have no idea what semantics you would attach to it in this case. I've not
seen any clear explanation of such semantics yet in this thread.

The only clear semantics I've seen expressed so far seem much more
appropriate for attaching to a noduplicate call to an intrinsic... But I
think I'll need to read this thread again to re-absorb much of the
information after the holidays. =]

[1]: Note, the current implementation of noduplicate is buggy, but
hopefully in a latent way -- it doesn't handle invokes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131226/bc3c82de/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Dec 2013 - [LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

Apparently Analagous Threads