thr3ads.net - search: "ptx

Displaying 20 results from an estimated 54 matches for "ptx_device".

Did you mean: pci_device

2011 Nov 16

[LLVMdev] PTX builtin functions.

Dear Justin, I am trying to add the support for some OpenCL builtin functions to the PTX backend. The attached file represent the first stub of a patch for the fmax builtin function. The test case I am trying is the following: define ptx_device float @f(float %x, float %y) { entry: %z = call float @fmax(float %x, float %y) ret float %z } declare float @fmax(float, float) But at the moment llc crashes saying that "calls are not supported", this does not happens with llvm builtins like llvm.sqrt.f32 Can you please give me a...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...ind { entry: %callback.addr = alloca i32*, align 8 store i32* %callback, i32** %callback.addr, align 8 %0 = load i32** %callback.addr, align 8 %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* %lock = getelementptr inbounds %struct.kernelgen_callback_t* %1, i32 0, i32 0 %call = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock, i32 1, i32 0) br label %while.cond while.cond: ; preds = %while.body, %entry %2 = load i32** %callback.addr, align 8 %3 = bitcast i32* %2 to %struct.kernelgen_callback_t* %lock1 = getelementptr inbounds %struct.ker...

[LLVMdev] PTX builtin functions.

2011 Nov 21

[LLVMdev] PTX builtin functions.

...define the LLVM intrinsic as @llvm.ptx.max(). > >>> Please follow the same convention when naming the __builtin_* > function. > >>> > >>>> > >>>> The test case I am trying is the following: > >>>> > >>>> define ptx_device float @f(float %x, float %y) { > >>>> entry: > >>>> %z = call float @fmax(float %x, float %y) > >>>> ret float %z > >>>> } > >>>> > >>>> declare float @fmax(float, float) > >>>> > >>&...

[LLVMdev] NVPTX: __iAtomicCAS support ?

2012 May 16

[LLVMdev] NVPTX: __iAtomicCAS support ?

...= alloca i32*, align 8 > store i32* %callback, i32** %callback.addr, align 8 > %0 = load i32** %callback.addr, align 8 > %1 = bitcast i32* %0 to %struct.kernelgen_callback_t* > %lock = getelementptr inbounds %struct.kernelgen_callback_t* %1, i32 0, > i32 0 > %call = call ptx_device i32 @_Z12__iAtomicCASPiii(i32* %lock, i32 1, i32 0) > br label %while.cond > > while.cond: ; preds = %while.body, %entry > %2 = load i32** %callback.addr, align 8 > %3 = bitcast i32* %2 to %struct.kernelgen_callback_t* > %lock1 = gete...

[LLVMdev] PTX builtin functions.

2011 Nov 21

[LLVMdev] PTX builtin functions.

...ax(). >> >>> Please follow the same convention when naming the __builtin_* >> >>> function. >> >>> >> >>>> >> >>>> The test case I am trying is the following: >> >>>> >> >>>> define ptx_device float @f(float %x, float %y) { >> >>>> entry: >> >>>> %z = call float @fmax(float %x, float %y) >> >>>> ret float %z >> >>>> } >> >>>> >> >>>> declare float @fmax(float, float) >> &gt...

[LLVMdev] PTX builtin functions.

2011 Nov 16

[LLVMdev] PTX builtin functions.

...case here. When you define a new intrinsic, use the following template as a name: int_ptx_max. This will define the LLVM intrinsic as @llvm.ptx.max(). Please follow the same convention when naming the __builtin_* function. > > The test case I am trying is the following: > > define ptx_device float @f(float %x, float %y) { > entry: > %z = call float @fmax(float %x, float %y) > ret float %z > } > > declare float @fmax(float, float) > > But at the moment llc crashes saying that "calls are not supported", > this does not > happens with llvm builtin...

[LLVMdev] PTX builtin functions.

2011 Nov 21

[LLVMdev] PTX builtin functions.

...lease follow the same convention when naming the __builtin_* > >> >>> function. > >> >>> > >> >>>> > >> >>>> The test case I am trying is the following: > >> >>>> > >> >>>> define ptx_device float @f(float %x, float %y) { > >> >>>> entry: > >> >>>> %z = call float @fmax(float %x, float %y) > >> >>>> ret float %z > >> >>>> } > >> >>>> > >> >>>> declare float @f...

[LLVMdev] [LLVMDev] Add not instruction to PTX backend

2011 May 11

[LLVMdev] [LLVMDev] Add not instruction to PTX backend

...PTXInstrInfo.td as you suggested before. multiclass PTX_LOGIC_2OP<string opcstr,PatFrag opnode> { ... } Now I am trying to write test case for logic and shift operations. But I have a trouble in mapping LLVM IR to PTX IR for "not" instruction. The test case I wrote is, define ptx_device i16 @t4_u16(i16 %x) { ; CHECK: not.b16 rh0, rh1, rh2; ; CHECK-NEXT: ret; %z = xor i16 %x, 1 ret i16 %z } Since LLVM IR doesn't support logical not directly, I use "xor i16 %x, 1" to represent logical not in LLVM IR. It turns out the IR is mapped to PTX "xor" not PTX &qu...

[LLVMdev] PTX builtin functions.

2011 Nov 22

[LLVMdev] PTX builtin functions.

...nvention when naming the __builtin_* >> >> >>> function. >> >> >>> >> >> >>>> >> >> >>>> The test case I am trying is the following: >> >> >>>> >> >> >>>> define ptx_device float @f(float %x, float %y) { >> >> >>>> entry: >> >> >>>> %z = call float @fmax(float %x, float %y) >> >> >>>> ret float %z >> >> >>>> } >> >> >>>> >> >> >>&...

[LLVMdev] PTX builtin functions.

2011 Nov 22

[LLVMdev] PTX builtin functions.

...ltin_* > >> >> >>> function. > >> >> >>> > >> >> >>>> > >> >> >>>> The test case I am trying is the following: > >> >> >>>> > >> >> >>>> define ptx_device float @f(float %x, float %y) { > >> >> >>>> entry: > >> >> >>>> %z = call float @fmax(float %x, float %y) > >> >> >>>> ret float %z > >> >> >>>> } > >> >> >>>> &g...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...ch is invalid. Problem does not occur if compiled for sm_10. > cat test.ll ; ModuleID = '__kernelgen_main_module' target datalayout = "e-p:64:64-i64:64:64-f64:64:64-n1:8:16:32:64" target triple = "ptx64-unknown-unknown" %struct.float2 = type { float, float } define ptx_device void @__internal_dsmul(%struct.float2* noalias nocapture sret %agg.result, %struct.float2* nocapture byval %x, %struct.float2* nocapture byval %y) nounwind inlinehint alwaysinline { entry: %y1 = getelementptr inbounds %struct.float2* %x, i64 0, i32 1 %0 = load float* %y1, align 4 %sub = fsub...

[LLVMdev] PTX builtin functions.

2011 Nov 23

[LLVMdev] PTX builtin functions.

...nction. > >> >> >> >>> > >> >> >> >>>> > >> >> >> >>>> The test case I am trying is the following: > >> >> >> >>>> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { > >> >> >> >>>> entry: > >> >> >> >>>> %z = call float @fmax(float %x, float %y) > >> >> >> >>>> ret float %z > >> >> >> >>>> } > &...

[LLVMdev] PTX builtin functions.

2011 Nov 23

[LLVMdev] PTX builtin functions.

...>> >>> function. >> >> >> >>> >> >> >> >>>> >> >> >> >>>> The test case I am trying is the following: >> >> >> >>>> >> >> >> >>>> define ptx_device float @f(float %x, float %y) { >> >> >> >>>> entry: >> >> >> >>>> %z = call float @fmax(float %x, float %y) >> >> >> >>>> ret float %z >> >> >> >>>> } >> >> >>...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

...v32:32:32-v64:64:64-v128:128:128-n16:32:64" target triple = "nvptx64-unknown-unknown" @__kernelgen_version = constant [15 x i8] c"0.2/1654:1675M\00" define ptx_kernel void @__kernelgen_matvec_loop_7(i32* nocapture) #0 { "Loop Function Root": %tid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.tid.x() %ctaid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9 %BlockLB.Add.ThreadPosInBlock.x = add i32 %PositionOfBlockInGrid.x, %tid.x %isThreadLBgtLoopUB.x = icmp sgt i32 %BlockLB.Add.ThreadPosIn...

[LLVMdev] PTX builtin functions.

2011 Nov 23

[LLVMdev] PTX builtin functions.

...; >> >> >>> > > >> >> >> >>>> > > >> >> >> >>>> The test case I am trying is the following: > > >> >> >> >>>> > > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { > > >> >> >> >>>> entry: > > >> >> >> >>>> %z = call float @fmax(float %x, float %y) > > >> >> >> >>>> ret float %z > > >> >> >> >&...

[LLVMdev] llvm_anyint_ty clarification

2011 Nov 19

[LLVMdev] llvm_anyint_ty clarification

...attached max_not_working.patch file): def int_ptx_max : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, LLVMMatchType<0>], [Commutative]>; The problem is that the builtin is not recognised in the following test case: define ptx_device i16 @max_16(i16 %a, i16 %b) { entry: %d = call i16 @llvm.ptx.max(i16 %a, i16 %b) ret i16 %d } declare i16 @llvm.ptx.max(i16, i16) Things change if I define explicitly the i16 intrinsic, like this: def int_ptx_max : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16...

[LLVMdev] [NVPTX] powf intrinsic in unimplemented

2012 Sep 06

[LLVMdev] [NVPTX] powf intrinsic in unimplemented

Dear all, During app compilation we have a crash in NVPTX backend: LLVM ERROR: Cannot select: 0x732b270: i64 = ExternalSymbol'__powisf2' [ID=18] As I understand LLVM tries to lower the following call %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8) nounwind readonly to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain such intrinsic, however it should be builtin, according to cuda/include/math_functions.h Is my understanding correct, and we need simply add the corresponding...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

[LLVMdev] PTX builtin functions.

2011 Dec 04

[LLVMdev] PTX builtin functions.

...gt; >>> >> > >> >> >> >>>> >> > >> >> >> >>>> The test case I am trying is the following: >> > >> >> >> >>>> >> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) { >> > >> >> >> >>>> entry: >> > >> >> >> >>>> %z = call float @fmax(float %x, float %y) >> > >> >> >> >>>> ret float %z >> > >> >&gt...

[LLVMdev] PTX builtin functions.

2011 Dec 08

[LLVMdev] PTX builtin functions.

search for: ptx_device