thr3ads.net - llvm dev - [LLVMdev] Add a new llvm intrinsic? [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Jeffrey Yasskin

2013-Nov-11 05:16 UTC

[LLVMdev] Add a new llvm intrinsic?

Sorry for the delay in getting back to you. I don't know if anything
came out of this, since Xiaoyi never wrote back. What does some of the
affected code look like? My opinion is still that 'restrict' should
mean that no other thread should use a pointer aliasing the restrict
pointer, although if many threads are started after the lifetime of
the restrict pointer starts, and they depend on the value of the
restrict pointer, and they're joined, and then a use of the restrict
pointer is moved ahead of the join so that it races with the other
threads that depend on the restrict pointer, that's definitely an LLVM
bug.

On Fri, Nov 8, 2013 at 1:50 PM, Owen Anderson <resistor at mac.com>
wrote:> Hi Jeff,
>
> Do you know if anything came of this?  I understand we may need to seek
> clarification to get a formal answer, particularly with respect to C, but
it
> seems pretty clear to me that this is a significant QoI issue, both for C
> and CL.  LLVM is effectively hoisting a load above a thread-join.  This may
> or may not technically allowed in C, but it seems generally undesirable,
and
> it’s extremely undesirable in CL where these kinds of thread joins are a
> fundamental of the programming model.
>
> —Owen
>
>
> On Aug 6, 2013, at 5:36 PM, Jeffrey Yasskin <jyasskin at
googlers.com> wrote:
>
> Chandler pointed out another interpretation of C11/6.7.3.1, in which
> 'restrict' only addresses aliasing within a single thread. If
that's
> the right interpretation, then it's a bug in LLVM that it moves
> noalias pointers across memory-ordering operations at all, and you
> still don't need a new fence, just a bug fix.
>
> 6.7.3.1 says "During each execution of B, ...".
"During" could either
> mean just within the same thread or within any segment of a thread
> that doesn't happen-before or happen-after B.
>
> It's a defect in C that this is ambiguous. Anyone want to volunteer to
> send it to the committee? (I'll be happy to proofread, etc., just not
> be in charge of finding the right email target)
>
> On Tue, Aug 6, 2013 at 5:01 PM, Jeffrey Yasskin <jyasskin at
googlers.com>
> wrote:
>
> This sounds a lot like the question at
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064462.html. It
> sounds like you have a pointer marked 'restrict', but it's
actually
> aliased in another thread. That would be undefined behavior even with
> a stronger fence.
>
> On Tue, Aug 6, 2013 at 4:56 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com>
wrote:
>
> Hi,
>
> In OpenCL, the "barrier()" function, as well as various target
specific
> memory fence intrinsics, should prevent loads/stores of the relevant
address
> space from being moved across them.
> Kernel pointers with "restrict" attributes are implemented by
marking the
> pointer "noalias" in the LLVMIR. However, in LLVM,
"noalias" pointers are
> not affected by llvm memory fence instructions.
>
> To make sure all loads/stores, including those accessing
"restrict" pointers
> are not moved across the barrier/fence intrinsics, we have considered using
> customized alias analysis passes. However, we would like to move away from
> using customized passes and would like to use standard llvm mechanisms as
> much as possible.
>
> What do people think about adding an llvm intrinsic, something like
> llvm.opencl.mem_fence(i32) (or named something that doesn't have opencl
in
> the name, llvm.addrspace_fence?), which acts as a fence for a single given
> address space (assuming again that there's no problem with implementing
> these things as a series of different functions to get the full effect),
and
> which prevents even noalias pointers from being moved across it?
>
> Alternatively (possibly nicer) would be something that looks like the
memset
> intrinsic, which can work for any address space.
> llvm.addrspace_fence.p1.p2(void)
> llvm.addrspace_fence.p1(void) ...
>
> Thanks,
> Xiaoyi
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Owen Anderson

2013-Nov-11 19:09 UTC

head link

[LLVMdev] Add a new llvm intrinsic?

Hi Jeff,

It’s not really meaningful to talk about threads being created in the context of
an OpenCL kernel.  The other threads are always present.

void kernel(int * restrict array, int * restrict array2) {
	int value = array[0] + get_thread_id() + 1;
	barrier();
	array[get_thread_id()] = value;
	barrier();
	array2[get_thread_id()] = array[0];
}

In this example code, the kernel is well synchronized; there are no data races
on any elements of either array.  However, the results will differ if we CSE the
later read of array[0] with the earlier one.  Executed as written, the final
value of array2[0] will be array[0]+1.  If we perform the CSE, the result will
be just array[0].

—Owen

On Nov 10, 2013, at 9:16 PM, Jeffrey Yasskin <jyasskin at googlers.com>
wrote:
> Sorry for the delay in getting back to you. I don't know if anything
> came out of this, since Xiaoyi never wrote back. What does some of the
> affected code look like? My opinion is still that 'restrict' should
> mean that no other thread should use a pointer aliasing the restrict
> pointer, although if many threads are started after the lifetime of
> the restrict pointer starts, and they depend on the value of the
> restrict pointer, and they're joined, and then a use of the restrict
> pointer is moved ahead of the join so that it races with the other
> threads that depend on the restrict pointer, that's definitely an LLVM
> bug.
> 
> On Fri, Nov 8, 2013 at 1:50 PM, Owen Anderson <resistor at mac.com>
wrote:
>> Hi Jeff,
>> 
>> Do you know if anything came of this?  I understand we may need to seek
>> clarification to get a formal answer, particularly with respect to C,
but it
>> seems pretty clear to me that this is a significant QoI issue, both for
C
>> and CL.  LLVM is effectively hoisting a load above a thread-join.  This
may
>> or may not technically allowed in C, but it seems generally
undesirable, and
>> it’s extremely undesirable in CL where these kinds of thread joins are
a
>> fundamental of the programming model.
>> 
>> —Owen
>> 
>> 
>> On Aug 6, 2013, at 5:36 PM, Jeffrey Yasskin <jyasskin at
googlers.com> wrote:
>> 
>> Chandler pointed out another interpretation of C11/6.7.3.1, in which
>> 'restrict' only addresses aliasing within a single thread. If
that's
>> the right interpretation, then it's a bug in LLVM that it moves
>> noalias pointers across memory-ordering operations at all, and you
>> still don't need a new fence, just a bug fix.
>> 
>> 6.7.3.1 says "During each execution of B, ...".
"During" could either
>> mean just within the same thread or within any segment of a thread
>> that doesn't happen-before or happen-after B.
>> 
>> It's a defect in C that this is ambiguous. Anyone want to volunteer
to
>> send it to the committee? (I'll be happy to proofread, etc., just
not
>> be in charge of finding the right email target)
>> 
>> On Tue, Aug 6, 2013 at 5:01 PM, Jeffrey Yasskin <jyasskin at
googlers.com>
>> wrote:
>> 
>> This sounds a lot like the question at
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064462.html. It
>> sounds like you have a pointer marked 'restrict', but it's
actually
>> aliased in another thread. That would be undefined behavior even with
>> a stronger fence.
>> 
>> On Tue, Aug 6, 2013 at 4:56 PM, Guo, Xiaoyi <Xiaoyi.Guo at
amd.com> wrote:
>> 
>> Hi,
>> 
>> In OpenCL, the "barrier()" function, as well as various
target specific
>> memory fence intrinsics, should prevent loads/stores of the relevant
address
>> space from being moved across them.
>> Kernel pointers with "restrict" attributes are implemented by
marking the
>> pointer "noalias" in the LLVMIR. However, in LLVM,
"noalias" pointers are
>> not affected by llvm memory fence instructions.
>> 
>> To make sure all loads/stores, including those accessing
"restrict" pointers
>> are not moved across the barrier/fence intrinsics, we have considered
using
>> customized alias analysis passes. However, we would like to move away
from
>> using customized passes and would like to use standard llvm mechanisms
as
>> much as possible.
>> 
>> What do people think about adding an llvm intrinsic, something like
>> llvm.opencl.mem_fence(i32) (or named something that doesn't have
opencl in
>> the name, llvm.addrspace_fence?), which acts as a fence for a single
given
>> address space (assuming again that there's no problem with
implementing
>> these things as a series of different functions to get the full
effect), and
>> which prevents even noalias pointers from being moved across it?
>> 
>> Alternatively (possibly nicer) would be something that looks like the
memset
>> intrinsic, which can work for any address space.
>> llvm.addrspace_fence.p1.p2(void)
>> llvm.addrspace_fence.p1(void) ...
>> 
>> Thanks,
>> Xiaoyi
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131111/075b8247/attachment.html>

Jeffrey Yasskin

2013-Nov-15 03:38 UTC

head link

[LLVMdev] Add a new llvm intrinsic?

On Mon, Nov 11, 2013 at 2:09 PM, Owen Anderson <resistor at mac.com>
wrote:> Hi Jeff,
>
> It’s not really meaningful to talk about threads being created in the
> context of an OpenCL kernel.  The other threads are always present.
Semantically, I'd view this as a collection of threads being spawned
at the entry to the kernel function, and joined at the end, with
release/acquire edges at each barrier. But yes, the threads aren't
literally created or destroyed.
> void kernel(int * restrict array, int * restrict array2) {
> int value = array[0] + get_thread_id() + 1;
> barrier();
> array[get_thread_id()] = value;
> barrier();
> array2[get_thread_id()] = array[0];
> }
>
> In this example code, the kernel is well synchronized; there are no data
> races on any elements of either array.  However, the results will differ if
> we CSE the later read of array[0] with the earlier one.  Executed as
> written, the final value of array2[0] will be array[0]+1.  If we perform
the
> CSE, the result will be just array[0].
I think this follows all the 'restrict' rules.

6.7.3.1 Formal definition of restrict

1 Let D be a declaration of an ordinary identifier that provides a
means of designating an
object P as a restrict-qualified pointer to type T.

2 If D appears inside a block and does not have storage class extern,
let B denote the block. If D appears in the list of parameter
declarations of a function definition, let B denote the associated
block. Otherwise, let B denote the block of main (or the block of
whatever function is called at program startup in a freestanding
environment).

3 In what follows, a pointer expression E is said to be based on
object P if (at some sequence point in the execution of B prior to the
evaluation of E) modifying P to point to a copy of the array object
into which it formerly pointed would change the value of E.117) Note
that ‘‘based’’ is defined only for expressions with pointer types.

4 During each execution of B, let L be any lvalue that has &L based on
P. If L is used to access the value of the object X that it
designates, and X is also modified (by any means), then the following
requirements apply: T shall not be const-qualified. Every other lvalue
used to access the value of X shall also have its address based on P.
Every access that modifies X shall be considered also to modify P, for
the purposes of this subclause. If P is assigned the value of a
pointer expression E that is based on another restricted pointer
object P2, associated with block B2, then either the execution of B2
shall begin before the execution of B, or the execution of B2 shall
end prior to the assignment. If these requirements are not met, then
the behavior is undefined.

5 Here an execution of B means that portion of the execution of the
program that would correspond to the lifetime of an object with scalar
type and automatic storage duration associated with B.


D is a parameter declaration in 'kernel'. P is 'array' (or
'array2',
but I'll just look at 'array' for now). E is expressions like
"&array[get_thread_id()]", which is the address of an object X.
The
initial call to the kernel (from a single thread) sets the value of
'array' (P2). The other threads involved in running the kernel have
their own variable 'array' (P), which are assigned from P2. B2 (the
initial call) begins before the execution of B (the other thread's
execution of 'kernel'. All lvalues used to access X have their
addresses depend on 'array'. (This was "many threads ... depend on
the
value of the restrict pointer".)

So it's an LLVM bug to assume that array[0] can't alias
array[get_thread_id()] even running in another OpenCL thread. I don't
suppose we have the same bug if the value of a restrict pointer is
stored to a global variable, and then a function is called that uses
the global? Or if you write the value of a restrict pointer to a
concurrent queue to send it to a non-OpenCL thread?
> On Nov 10, 2013, at 9:16 PM, Jeffrey Yasskin <jyasskin at
googlers.com> wrote:
>
> Sorry for the delay in getting back to you. I don't know if anything
> came out of this, since Xiaoyi never wrote back. What does some of the
> affected code look like? My opinion is still that 'restrict' should
> mean that no other thread should use a pointer aliasing the restrict
> pointer, although if many threads are started after the lifetime of
> the restrict pointer starts, and they depend on the value of the
> restrict pointer, and they're joined, and then a use of the restrict
> pointer is moved ahead of the join so that it races with the other
> threads that depend on the restrict pointer, that's definitely an LLVM
> bug.
>
> On Fri, Nov 8, 2013 at 1:50 PM, Owen Anderson <resistor at mac.com>
wrote:
>
> Hi Jeff,
>
> Do you know if anything came of this?  I understand we may need to seek
> clarification to get a formal answer, particularly with respect to C, but
it
> seems pretty clear to me that this is a significant QoI issue, both for C
> and CL.  LLVM is effectively hoisting a load above a thread-join.  This may
> or may not technically allowed in C, but it seems generally undesirable,
and
> it’s extremely undesirable in CL where these kinds of thread joins are a
> fundamental of the programming model.
>
> —Owen
>
>
> On Aug 6, 2013, at 5:36 PM, Jeffrey Yasskin <jyasskin at
googlers.com> wrote:
>
> Chandler pointed out another interpretation of C11/6.7.3.1, in which
> 'restrict' only addresses aliasing within a single thread. If
that's
> the right interpretation, then it's a bug in LLVM that it moves
> noalias pointers across memory-ordering operations at all, and you
> still don't need a new fence, just a bug fix.
>
> 6.7.3.1 says "During each execution of B, ...".
"During" could either
> mean just within the same thread or within any segment of a thread
> that doesn't happen-before or happen-after B.
>
> It's a defect in C that this is ambiguous. Anyone want to volunteer to
> send it to the committee? (I'll be happy to proofread, etc., just not
> be in charge of finding the right email target)
>
> On Tue, Aug 6, 2013 at 5:01 PM, Jeffrey Yasskin <jyasskin at
googlers.com>
> wrote:
>
> This sounds a lot like the question at
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064462.html. It
> sounds like you have a pointer marked 'restrict', but it's
actually
> aliased in another thread. That would be undefined behavior even with
> a stronger fence.
>
> On Tue, Aug 6, 2013 at 4:56 PM, Guo, Xiaoyi <Xiaoyi.Guo at amd.com>
wrote:
>
> Hi,
>
> In OpenCL, the "barrier()" function, as well as various target
specific
> memory fence intrinsics, should prevent loads/stores of the relevant
address
> space from being moved across them.
> Kernel pointers with "restrict" attributes are implemented by
marking the
> pointer "noalias" in the LLVMIR. However, in LLVM,
"noalias" pointers are
> not affected by llvm memory fence instructions.
>
> To make sure all loads/stores, including those accessing
"restrict" pointers
> are not moved across the barrier/fence intrinsics, we have considered using
> customized alias analysis passes. However, we would like to move away from
> using customized passes and would like to use standard llvm mechanisms as
> much as possible.
>
> What do people think about adding an llvm intrinsic, something like
> llvm.opencl.mem_fence(i32) (or named something that doesn't have opencl
in
> the name, llvm.addrspace_fence?), which acts as a fence for a single given
> address space (assuming again that there's no problem with implementing
> these things as a series of different functions to get the full effect),
and
> which prevents even noalias pointers from being moved across it?
>
> Alternatively (possibly nicer) would be something that looks like the
memset
> intrinsic, which can work for any address space.
> llvm.addrspace_fence.p1.p2(void)
> llvm.addrspace_fence.p1(void) ...
>
> Thanks,
> Xiaoyi
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Nov 2013 - [LLVMdev] Add a new llvm intrinsic?

[LLVMdev] Add a new llvm intrinsic?

[LLVMdev] Add a new llvm intrinsic?

[LLVMdev] Add a new llvm intrinsic?

Maybe Matching Threads