thr3ads.net - search: "f32"

Displaying 20 results from an estimated 1188 matches for "f32".

Did you mean: 32

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...t %agg.result, %struct.float2* nocapture byval %x, %struct.float2* nocapture byval %y) nounwind inlinehint alwaysinline { entry: %y1 = getelementptr inbounds %struct.float2* %x, i64 0, i32 1 %0 = load float* %y1, align 4 %sub = fsub float -0.000000e+00, %0 %1 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %sub, float 4.097000e+03, float %0) nounwind %2 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %0, float 4.097000e+03, float %1) nounwind %y5 = getelementptr inbounds %struct.float2* %y, i64 0, i32 1 %3 =...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...aram .align 4 .b8 __internal_dsmul_param_1[8], .param .align 4 .b8 __internal_dsmul_param_2[8] ) // @__internal_dsmul { .reg .pred %p<396>; .reg .s16 %rc<396>; .reg .s16 %rs<396>; .reg .s32 %r<396>; .reg .s64 %rl<396>; .reg .f32 %f<396>; .reg .f64 %fl<396>; // BB#0: // %entry mov.b64 %rl0, __internal_dsmul_param_1; cvta.local.u64 %rl0, %rl0; ld.f32 %f0, [%rl0+4]; neg.f32 %f1, %f0; mov.b64 %rl1, __internal_dsmul_param_2; mov.f32 %f2, 0f45800800; // inline asm mad.f32...

[LLVMdev] Error w/ Tablegen + Intrinsics

2009 Apr 15

[LLVMdev] Error w/ Tablegen + Intrinsics

It seems that Tablegen is generating intrinsic ID's off by in DAGISel.inc In DAGISel.inc, I have the following pattern: int64_t CN1 = Tmp0->getZExtValue(); // Pattern: (intrinsic_w_chain:f32 103:iPTR, GPRF32:f32:$src0, GPRF32:f32:$src1, GPRF32:f32:$src2) // Emits: (MACRO_FMA_f32:f32 GPRF32:f32:$src0, GPRF32:f32:$src1, GPRF32:f32:$src2) // Pattern complexity = 8 cost = 1 size = 0 if (CN1 == INT64_C(103)) { SDValue N2 = N.getOperand(2); SDValue N3 = N.getOp...

[PATCH 1/2] allow path to envyas binary to be specified

2014 Sep 24

[PATCH 1/2] allow path to envyas binary to be specified

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/shader/Makefile | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/shader/Makefile b/src/shader/Makefile index 46658e9..2d789be 100644 --- a/src/shader/Makefile +++ b/src/shader/Makefile @@ -24,20 +24,21 @@ NVF0_SHADERS = xfrm2nvf0.vpc \ videonvf0.fpc SHADERS = $(NVC0_SHADERS)

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 09

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/src/gallium/drivers/nouveau...

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

2013 Aug 05

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

On my target store/load of f32 or i32 are equivalents. Previously I had duplicate instructions def in .td to map f32 and i32 to the same opcode. I deleted all that and I instead tried a new approach (to simplify things) : setOperationAction(ISD::STORE, MVT::f32, Promote); AddPromotedToType(ISD::STORE, MVT::f32, MVT::i32);...

[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 10

[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- V2: beat me, whip me, split out F64 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 ++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff...

[Bug 78161] New: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break

2014 May 01

[Bug 78161] New: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break

https://bugs.freedesktop.org/show_bug.cgi?id=78161 Priority: medium Bug ID: 78161 Assignee: nouveau at lists.freedesktop.org Summary: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break Severity: normal Classification: Unclassified OS: Linux (All)

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

2013 Aug 05

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

On Mon, Aug 5, 2013 at 2:25 PM, Tom Stellard <tom at stellard.net> wrote: > On Mon, Aug 05, 2013 at 02:09:58PM -0400, Francois Pichet wrote: > > On my target store/load of f32 or i32 are equivalents. > > Previously I had duplicate instructions def in .td to map f32 and i32 to > > the same opcode. > > > > I deleted all that and I instead tried a new approach (to simplify > things) : > > > > setOperationAction(ISD::STORE, MVT::f32,...

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

2013 Aug 05

[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?

On Mon, Aug 05, 2013 at 02:09:58PM -0400, Francois Pichet wrote: > On my target store/load of f32 or i32 are equivalents. > Previously I had duplicate instructions def in .td to map f32 and i32 to > the same opcode. > > I deleted all that and I instead tried a new approach (to simplify things) : > > setOperationAction(ISD::STORE, MVT::f32, Promote); > AddPromotedToTyp...

[LLVMdev] Error w/ Tablegen + Intrinsics

2009 Apr 15

[LLVMdev] Error w/ Tablegen + Intrinsics

...trinsics file? On Apr 14, 2009, at 6:34 PM, Villmow, Micah wrote: > It seems that Tablegen is generating intrinsic ID’s off by in > DAGISel.inc > > In DAGISel.inc, I have the following pattern: > int64_t CN1 = Tmp0->getZExtValue(); > > // Pattern: (intrinsic_w_chain:f32 103:iPTR, GPRF32:f32:$src0, > GPRF32:f32:$src1, GPRF32:f32:$src2) > // Emits: (MACRO_FMA_f32:f32 GPRF32:f32:$src0, GPRF32:f32:$src1, > GPRF32:f32:$src2) > // Pattern complexity = 8 cost = 1 size = 0 > if (CN1 == INT64_C(103)) { > SDValue N2 = N.getOperand(2...

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016 Oct 02

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

...; the opts to a fixed point? It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x00000000 (0) 4: mov u32 %r39 0x3f800000 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0...

NVPTX - Reordering load instructions

2018 Jun 21

NVPTX - Reordering load instructions

...; > for (int i = 0; i < BLOCK_SIZE; i++) { > for (int j = 0; j < i; j++) > peri_col[idx][i] -= peri_col[idx][j] * dia[j][i]; > peri_col[idx][i] /= dia[i][i]; > } NVCC emits PTX instructions where all loads from shared memory are packed together: > ... > ld.shared.f32 %f546, [kernel_dia+440]; > ld.shared.f32 %f545, [%r4+-996]; > ld.shared.f32 %f544, [kernel_dia+56]; > ld.shared.f32 %f543, [kernel_dia+88]; > ld.shared.f32 %f542, [kernel_dia+500]; > ld.shared.f32 %f541, [kernel_dia+84]; > ld.shared.f32 %f540, [%r4+-972]; > ld.sha...

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

2014 Jul 05

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32 No piglit regressions observed on nv50 and nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- V2: fix usage of wrong variable V3: enable F64 support V4: - disable F64 support again - handle saturate flag...

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

2011 Sep 01

[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point

...nt16_t *a, const int16_t *b, unsigned int len) @@ -97,4 +121,81 @@ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, u return ret; } +#elif defined(FLOATING_POINT) + +static inline int32_t saturate_float_to_16bit(float a) { + int32_t ret; + asm ("vmov.f32 d0[0], %[a]\n" + "vcvt.s32.f32 d0, d0, #15\n" + "vqrshrn.s32 d0, q0, #15\n" + "vmov.s16 %[ret], d0[0]\n" + : [ret] "=&r" (ret) + : [a] "r" (a) + : "q0"); + return ret; +} +#undef...

[LLVMdev] Tablegen question

2009 Apr 15

[LLVMdev] Tablegen question

...gt; This is the intrinsic definition: > def int_opencl_math_fdistance_fast : Intrinsic<[llvm_float_ty], > [llvm_anyfloat_ty, LLVMMatchType<0>]>; > > The problem comes when I try to use the intrinsic. It gives me the > following error: > GPRV2F32:f32:$src1 MACRO_DISTANCE_FAST_v2f32: (set GPRF32:f32:$dst, > (intrinsic_w_chain:f32 84:iPTR, GPRV2F32:v2f32:$src0, > GPRV2F32:f32:$src1)) > TableGen.exe: In MACRO_DISTANCE_FAST_v2f32: Type inference > contradiction > found in node! Your "$src1" is f32 instead of v2f32....

[LLVMdev] fp_round libcall

2008 Jul 07

[LLVMdev] fp_round libcall

Hi, I'm trying to emit FP_ROUND f64 -> f32 considering a mips target that only supports single float point operations. The problem is that f32 is considered legal on this target but f64 doesn't and the only way I can codegen this instruction is using setConvertAction(MVT::f64, MVT::f32, Expand), which issues a EmitStackConvert. What if...

How to constraint instructions reordering from patterns?

2018 May 04

How to constraint instructions reordering from patterns?

...i32<64> t5: ch = llvm.clp.set.rspb t3, TargetConstant:i16<393>, Constant:i32<64> t8: ch = llvm.clp.set.rspsu t5, TargetConstant:i16<394>, Constant:i32<8> t13: ch = store<Volatile ST4[@x1](tbaa=<0x3dbe418>)> t8, ConstantFP:f32<1.000000e+00>, GlobalAddress:i16<float* @x1> 0, undef:i16 t16: ch = store<Volatile ST4[@x2](tbaa=<0x3dbe418>)> t13, ConstantFP:f32<2.000000e+00>, GlobalAddress:i16<float* @x2> 0, undef:i16 t19: ch = store<Volatile ST4[@x3](tbaa=<0x3dbe418>)...

[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 11

[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions

On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote: > Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 > > Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> > --- > V2: beat me, whip me, split out F64 > > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 ++++++++++++++++++++++ > 1 fil...

[PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 11

[PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

...t;>> >>> >>> On 11.01.2015 22:54, Ilia Mirkin wrote: >>>> >>>> On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann >>>> <tobias.johannes.klausmann at mni.thm.de> wrote: >>>>> >>>>> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, >>>>> {S16/32})->F32 >>>>> >>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>>>> --- >>>>> V2: Split out F64 parts >>>>> V3: rem...

search for: f32