thr3ads.net - llvm dev - [LLVMdev] Loads moving across barriers [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Matt Arsenault

2013-Dec-04 23:33 UTC

[LLVMdev] Loads moving across barriers

On 11/11/2013 03:13 PM, Andrew Trick wrote:> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2 at gmail.com>
wrote:
>
>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth <chandlerc at
google.com> wrote:
>>
>>> Perhaps you're instead trying to say that with certain address
spaces "noalias" (and by inference, "restrict" at the
language level) has a different semantic model than other address spaces? While
it's less worrisome than the first interpretation, I still don't really
like it.
>>>
>> This sounds right. With the constant address space, anything you do is
OK since it’s constant. Private address space is supposed to be totally
inaccessible from other workitems, so parallel modifications aren’t a concern.
The others require explicit synchronization which noalias would need to be aware
of.
> FWIW, it seems generally useful to me to have a nomemfence function
attribute and intrinsic property. We should avoid memory optimization (and
possibly other optimization) across these regardless of alias analysis.
>I'm think I'll try implementing this. Ideally it would be parameterized 
over the address space, so it makes more sense for it to be a memfence 
attribute rather than a nomemfence. You would then have an arbitrary 
number of memfence(N) attributes for each required address space.

Andrew Trick

2013-Dec-05 00:29 UTC

head link

[LLVMdev] Loads moving across barriers

On Dec 4, 2013, at 3:33 PM, Matt Arsenault <Matthew.Arsenault at amd.com>
wrote:
> On 11/11/2013 03:13 PM, Andrew Trick wrote:
>> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2 at gmail.com>
wrote:
>> 
>>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth <chandlerc at
google.com> wrote:
>>> 
>>>> Perhaps you're instead trying to say that with certain
address spaces "noalias" (and by inference, "restrict" at
the language level) has a different semantic model than other address spaces?
While it's less worrisome than the first interpretation, I still don't
really like it.
>>>> 
>>> This sounds right. With the constant address space, anything you do
is OK since it’s constant. Private address space is supposed to be totally
inaccessible from other workitems, so parallel modifications aren’t a concern.
The others require explicit synchronization which noalias would need to be aware
of.
>> FWIW, it seems generally useful to me to have a nomemfence function
attribute and intrinsic property. We should avoid memory optimization (and
possibly other optimization) across these regardless of alias analysis.
>> 
> I'm think I'll try implementing this. Ideally it would be
parameterized over the address space, so it makes more sense for it to be a
memfence attribute rather than a nomemfence. You would then have an arbitrary
number of memfence(N) attributes for each required address space.
So for correctness, would we need to tag all functions with memfence(0..M) until
we can prove otherwise? That seem heinous. Better to have an optional attribute
that can be added to expose optimization. Is it important in practice to
optimize the case of memfence(I) + nomemfence(J)? If so, is there a problem with
nomemfence(N)?

-Andy

Matt Arsenault

2013-Dec-05 01:19 UTC

head link

[LLVMdev] Loads moving across barriers

On 12/04/2013 04:29 PM, Andrew Trick wrote:> On Dec 4, 2013, at 3:33 PM, Matt Arsenault <Matthew.Arsenault at
amd.com> wrote:
>
>> On 11/11/2013 03:13 PM, Andrew Trick wrote:
>>> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2 at
gmail.com> wrote:
>>>
>>>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth <chandlerc at
google.com> wrote:
>>>>
>>>>> Perhaps you're instead trying to say that with certain
address spaces "noalias" (and by inference, "restrict" at
the language level) has a different semantic model than other address spaces?
While it's less worrisome than the first interpretation, I still don't
really like it.
>>>>>
>>>> This sounds right. With the constant address space, anything
you do is OK since it’s constant. Private address space is supposed to be
totally inaccessible from other workitems, so parallel modifications aren’t a
concern. The others require explicit synchronization which noalias would need to
be aware of.
>>> FWIW, it seems generally useful to me to have a nomemfence function
attribute and intrinsic property. We should avoid memory optimization (and
possibly other optimization) across these regardless of alias analysis.
>>>
>> I'm think I'll try implementing this. Ideally it would be
parameterized over the address space, so it makes more sense for it to be a
memfence attribute rather than a nomemfence. You would then have an arbitrary
number of memfence(N) attributes for each required address space.
> So for correctness, would we need to tag all functions with memfence(0..M)
until we can prove otherwise? That seem heinous.I was thinking the absence of it would mean no memfence in any address 
space, which is the current behavior. This adds the option of
fencing.> Better to have an optional attribute that can be added to expose
optimization. Is it important in practice to optimize the case of memfence(I) +
nomemfence(J)?I think it would be important for the GPU case. You never need a 
memfence for private address space / addrspace 0, but you frequently 
want them for local or global. The local or global writes can't be 
reordered, but it could be very beneficial to move the private accesses 
across fences which might help reduce register usage.
>   If so, is there a problem with nomemfence(N)?nomemfence is the current assumption made on an arbitrary call, and it's 
the common case. Specifying the absence of a fence seems backwards of 
how this is used and more cumbersome to deal with. To match the current 
behavior, it would require littering nomemfence for any possible address 
space everywhere. In OpenCL you specify your fences, so it would be more 
straightforward to map that. If I have a memfence intrinsic, I just need 
to mark it with the fence attribute, and then propogate it to its 
callers. There would generally only be a few of them in any program 
compared to fenceless calls. To implement this with nomemfence, I would 
have to mark every function with at least 4 nomemfences, and remove them 
when encountering the memfence intrinsic.

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Dec 2013 - [LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

[LLVMdev] Loads moving across barriers

Reasonably Related Threads