search for: workitem

Displaying 20 results from an estimated 33 matches for "workitem".

2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
...208 208 208 208 208 208 208 208 208 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 > > It looks like the addressing in as1.ll is incorrectly concluded to be uniform: > > %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 > %7 = tail call i32 @llvm.amdgcn.workgroup.id.x() #0 > %mul.i.i.i.i.i = mul nsw i32 %7, %3 > %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 > %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 > %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group...
2012 Dec 01
6
[LLVMdev] [RFC] "noclone" function attribute
Hi, OpenCL has a "barrier" function with very specific semantics, and there is currently no analogue to model this in LLVM. This has been touched on by the SPIR folks but I don't believe they put forward a proposal. The barrier function is a special function that ensures that all workitems executing a kernel have executed up to that point before execution on any workitem can continue. The CL spec is specific about how user kernels can use barriers - the sequence of barriers that are hit by all workitems in a workgroup must be identical. An issue occurs when defining what "the...
2013 Nov 09
3
[LLVMdev] Loads moving across barriers
...el than other address spaces? While it's less worrisome than the first interpretation, I still don't really like it. > This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of.
2012 Dec 02
0
[LLVMdev] [RFC] "noclone" function attribute
...Hi, OpenCL has a "barrier" function with very specific semantics, and there is currently no analogue to model this in LLVM. This has been touched on by the SPIR folks but I don't believe they put forward a proposal. The barrier function is a special function that ensures that all workitems executing a kernel have executed up to that point before execution on any workitem can continue. The CL spec is specific about how user kernels can use barriers - the sequence of barriers that are hit by all workitems in a workgroup must be identical. An issue occurs when defining what "the...
2017 Dec 05
3
[AMDGPU] Strange results with different address spaces
Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...l.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* nocapture %array) #0 { %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 %mul26.i = mul i32 %x.i12.i, %x.i.i %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 %add.i = add i32 %x.i4.i, %mul26.i %0 = sext i32 %add.i to i64 %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, i64 %0 store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa !8 ret void } which cannot be handled by llc with the m...
2013 Nov 11
0
[LLVMdev] Loads moving across barriers
...address spaces? While it's less worrisome than the first interpretation, I still don't really like it. >> > > This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other optimizat...
2020 Apr 13
3
Are AMDGPU intrinsics available in LLVM IR ?
Hi! I'm trying to figure out how to access the workgroup id from within the LLVM IR language when lowering with the AMDGPU backend. Looking at the 'llvm/include/llvm/IR/IntrinsicsAMDGPU.td' file there are intrinsics defined to access the workitem index (thread index), but this file lives in 'llvm/include': //===----------------------------------------------------------------------===// // ABI Special Intrinsics //===----------------------------------------------------------------------===// defm int_amdgcn_workitem_id : AMDGPURead...
2013 Dec 04
2
[LLVMdev] Loads moving across barriers
...address spaces? While it's less worrisome than the first interpretation, I still don't really like it. >>> >> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. > FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other optim...
2011 Aug 02
1
CompiledCode execution + using scope + local variables in a loop => NullReferenceException
Hi, I have reported a bug on codeplex: http://ironruby.codeplex.com/workitem/6353 Will it be fixed in next release? Thank you, Anton -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20110802/78a24916/attachment.html>
2013 Dec 05
3
[LLVMdev] Loads moving across barriers
...While it's less worrisome than the first interpretation, I still don't really like it. >>>>> >>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly oth...
2013 Dec 05
0
[LLVMdev] Loads moving across barriers
...paces? While it's less worrisome than the first interpretation, I still don't really like it. >>>> >>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other o...
2016 Jan 28
6
Memory scope proposal
...ny language that uses them, including languages not yet invented. A new memory scope name can be added if the existing ones are insufficient. With the first try, we can define the standard scopes with what a common language that has memory scopes needs, e.g., OpenCL uses system, device, workgroup, workitem. It uses the same approach as LLVM has done for debug information. There are standard debug entities (that a common language (C) needs), and each new language uses those standard entities where there is a match, and subsequently defines only the delta. *A **bitcode example with the proposal* define...
2013 Dec 05
0
[LLVMdev] Loads moving across barriers
...#39;s less worrisome than the first interpretation, I still don't really like it. >>>>>> >>>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly...
2017 Dec 06
2
[AMDGPU] Strange results with different address spaces
...analysis on the examples it does the right thing and sees the load as divergent. $ opt -S -analyze -divergence -o - as1.ll Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_': DIVERGENT: %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 DIVERGENT: %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 DIVERGENT: %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 DIVERGENT: %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i DIVERGENT: %9 = load i32, i32 addrspace(1)* %8, align 4 DIVERGEN...
2016 Mar 22
1
Memory scope proposal
...ny language that uses them, including languages not yet invented. A new memory scope name can be added if the existing ones are insufficient. With the first try, we can define the standard scopes with what a common language that has memory scopes needs, e.g., OpenCL uses system, device, workgroup, workitem. It uses the same approach as LLVM has done for debug information. There are standard debug entities (that a common language (C) needs), and each new language uses those standard entities where there is a match, and subsequently defines only the delta. A bitcode example with the proposal *********...
2016 Mar 29
1
Memory scope proposal
...ding languages not > yet invented. A new memory scope name can be added if the existing > ones are insufficient. > > With the first try, we can define the standard scopes with what a > common language that has memory scopes needs, e.g., OpenCL uses > system, device, workgroup, workitem. It uses the same approach as LLVM > has done for debug information. There are standard debug entities > (that a common language (C) needs), and each new language uses those > standard entities where there is a match, and subsequently defines > only the delta. > > A bitcode ex...
2019 Feb 26
3
Dealing with illegal operand mappings in RegBankSelect
...serted instead of always adding plain copy. > > Would that work for you? You can’t legitimately copy from vector to scalar. It conceptually doesn’t work, and going through memory doesn’t help. The use instruction needs to be rewritten to (in the worst case) scalarize the operation for every workitem. A pseudocopy would still be some illegal operation which cannot exist which would need to be guaranteed to be removed, so I don’t think this would be any cleaner than allowing the illegal copies. > >> So far I’ve worked around this by lying and reporting all of the invalid source regis...
2013 Dec 21
2
[LLVMdev] Loads moving across barriers
...ss worrisome than the first interpretation, I still don't really like it. >>>>>>> >>>>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>>>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and poss...