thr3ads.net - search: "workitem"

Displaying 20 results from an estimated 33 matches for "workitem".

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

...208 208 208 208 208 208 208 208 208 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 > > It looks like the addressing in as1.ll is incorrectly concluded to be uniform: > > %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 > %7 = tail call i32 @llvm.amdgcn.workgroup.id.x() #0 > %mul.i.i.i.i.i = mul nsw i32 %7, %3 > %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 > %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 > %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce...

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entry: switch i32 %dim, label %get_local_id.exit [ i32 0, label %get_group...

[LLVMdev] [RFC] "noclone" function attribute

2012 Dec 01

[LLVMdev] [RFC] "noclone" function attribute

Hi, OpenCL has a "barrier" function with very specific semantics, and there is currently no analogue to model this in LLVM. This has been touched on by the SPIR folks but I don't believe they put forward a proposal. The barrier function is a special function that ensures that all workitems executing a kernel have executed up to that point before execution on any workitem can continue. The CL spec is specific about how user kernels can use barriers - the sequence of barriers that are hit by all workitems in a workgroup must be identical. An issue occurs when defining what "the...

[LLVMdev] Loads moving across barriers

2013 Nov 09

[LLVMdev] Loads moving across barriers

...el than other address spaces? While it's less worrisome than the first interpretation, I still don't really like it. > This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of.

[LLVMdev] [RFC] "noclone" function attribute

2012 Dec 02

[LLVMdev] [RFC] "noclone" function attribute

...Hi, OpenCL has a "barrier" function with very specific semantics, and there is currently no analogue to model this in LLVM. This has been touched on by the SPIR folks but I don't believe they put forward a proposal. The barrier function is a special function that ensures that all workitems executing a kernel have executed up to that point before execution on any workitem can continue. The CL spec is specific about how user kernels can use barriers - the sequence of barriers that are hit by all workitems in a workgroup must be identical. An issue occurs when defining what "the...

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

...l.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* nocapture %array) #0 { %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 %mul26.i = mul i32 %x.i12.i, %x.i.i %x.i4.i = tail call i32 @llvm.amdgcn.workitem.id.x() #2, !range !7 %add.i = add i32 %x.i4.i, %mul26.i %0 = sext i32 %add.i to i64 %arrayidx = getelementptr inbounds float, float addrspace(1)* %array, i64 %0 store float 1.000000e+00, float addrspace(1)* %arrayidx, align 4, !tbaa !8 ret void } which cannot be handled by llc with the m...

[LLVMdev] Loads moving across barriers

2013 Nov 11

[LLVMdev] Loads moving across barriers

...address spaces? While it's less worrisome than the first interpretation, I still don't really like it. >> > > This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other optimizat...

Are AMDGPU intrinsics available in LLVM IR ?

2020 Apr 13

Are AMDGPU intrinsics available in LLVM IR ?

Hi! I'm trying to figure out how to access the workgroup id from within the LLVM IR language when lowering with the AMDGPU backend. Looking at the 'llvm/include/llvm/IR/IntrinsicsAMDGPU.td' file there are intrinsics defined to access the workitem index (thread index), but this file lives in 'llvm/include': //===----------------------------------------------------------------------===// // ABI Special Intrinsics //===----------------------------------------------------------------------===// defm int_amdgcn_workitem_id : AMDGPURead...

[LLVMdev] Loads moving across barriers

2013 Dec 04

[LLVMdev] Loads moving across barriers

...address spaces? While it's less worrisome than the first interpretation, I still don't really like it. >>> >> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. > FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other optim...

CompiledCode execution + using scope + local variables in a loop => NullReferenceException

2011 Aug 02

CompiledCode execution + using scope + local variables in a loop => NullReferenceException

Hi, I have reported a bug on codeplex: http://ironruby.codeplex.com/workitem/6353 Will it be fixed in next release? Thank you, Anton -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/ironruby-core/attachments/20110802/78a24916/attachment.html>

[LLVMdev] Loads moving across barriers

2013 Dec 05

[LLVMdev] Loads moving across barriers

...While it's less worrisome than the first interpretation, I still don't really like it. >>>>> >>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly oth...

[LLVMdev] Loads moving across barriers

2013 Dec 05

[LLVMdev] Loads moving across barriers

...paces? While it's less worrisome than the first interpretation, I still don't really like it. >>>> >>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other o...

Memory scope proposal

2016 Jan 28

Memory scope proposal

...ny language that uses them, including languages not yet invented. A new memory scope name can be added if the existing ones are insufficient. With the first try, we can define the standard scopes with what a common language that has memory scopes needs, e.g., OpenCL uses system, device, workgroup, workitem. It uses the same approach as LLVM has done for debug information. There are standard debug entities (that a common language (C) needs), and each new language uses those standard entities where there is a match, and subsequently defines only the delta. *A **bitcode example with the proposal* define...

[LLVMdev] Loads moving across barriers

2013 Dec 05

[LLVMdev] Loads moving across barriers

...#39;s less worrisome than the first interpretation, I still don't really like it. >>>>>> >>>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly...

[AMDGPU] Strange results with different address spaces

2017 Dec 06

[AMDGPU] Strange results with different address spaces

...analysis on the examples it does the right thing and sees the load as divergent. $ opt -S -analyze -divergence -o - as1.ll Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_': DIVERGENT: %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 DIVERGENT: %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 DIVERGENT: %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 DIVERGENT: %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i DIVERGENT: %9 = load i32, i32 addrspace(1)* %8, align 4 DIVERGEN...

Memory scope proposal

2016 Mar 22

Memory scope proposal

2016 Mar 29

Memory scope proposal

...ding languages not > yet invented. A new memory scope name can be added if the existing > ones are insufficient. > > With the first try, we can define the standard scopes with what a > common language that has memory scopes needs, e.g., OpenCL uses > system, device, workgroup, workitem. It uses the same approach as LLVM > has done for debug information. There are standard debug entities > (that a common language (C) needs), and each new language uses those > standard entities where there is a match, and subsequently defines > only the delta. > > A bitcode ex...

Dealing with illegal operand mappings in RegBankSelect

2019 Feb 26

Dealing with illegal operand mappings in RegBankSelect

...serted instead of always adding plain copy. > > Would that work for you? You can’t legitimately copy from vector to scalar. It conceptually doesn’t work, and going through memory doesn’t help. The use instruction needs to be rewritten to (in the worst case) scalarize the operation for every workitem. A pseudocopy would still be some illegal operation which cannot exist which would need to be guaranteed to be removed, so I don’t think this would be any cleaner than allowing the illegal copies. > >> So far I’ve worked around this by lying and reporting all of the invalid source regis...

[LLVMdev] Loads moving across barriers

2013 Dec 21

[LLVMdev] Loads moving across barriers

...ss worrisome than the first interpretation, I still don't really like it. >>>>>>> >>>>>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of. >>>>> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and poss...

search for: workitem