Displaying 20 results from an estimated 1188 matches for "f32".
Did you mean:
32
2012 Jul 11
2
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
...t %agg.result, %struct.float2* nocapture byval %x, %struct.float2*
nocapture byval %y) nounwind inlinehint alwaysinline {
entry:
%y1 = getelementptr inbounds %struct.float2* %x, i64 0, i32 1
%0 = load float* %y1, align 4
%sub = fsub float -0.000000e+00, %0
%1 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float
%sub, float 4.097000e+03, float %0) nounwind
%2 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %0,
float 4.097000e+03, float %1) nounwind
%y5 = getelementptr inbounds %struct.float2* %y, i64 0, i32 1
%3 =...
2012 Nov 09
0
[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params
...aram .align 4 .b8 __internal_dsmul_param_1[8],
.param .align 4 .b8 __internal_dsmul_param_2[8]
) // @__internal_dsmul
{
.reg .pred %p<396>;
.reg .s16 %rc<396>;
.reg .s16 %rs<396>;
.reg .s32 %r<396>;
.reg .s64 %rl<396>;
.reg .f32 %f<396>;
.reg .f64 %fl<396>;
// BB#0: // %entry
mov.b64 %rl0, __internal_dsmul_param_1;
cvta.local.u64 %rl0, %rl0;
ld.f32 %f0, [%rl0+4];
neg.f32 %f1, %f0;
mov.b64 %rl1, __internal_dsmul_param_2;
mov.f32 %f2, 0f45800800;
// inline asm
mad.f32...
2009 Apr 15
2
[LLVMdev] Error w/ Tablegen + Intrinsics
It seems that Tablegen is generating intrinsic ID's off by in
DAGISel.inc
In DAGISel.inc, I have the following pattern:
int64_t CN1 = Tmp0->getZExtValue();
// Pattern: (intrinsic_w_chain:f32 103:iPTR, GPRF32:f32:$src0,
GPRF32:f32:$src1, GPRF32:f32:$src2)
// Emits: (MACRO_FMA_f32:f32 GPRF32:f32:$src0, GPRF32:f32:$src1,
GPRF32:f32:$src2)
// Pattern complexity = 8 cost = 1 size = 0
if (CN1 == INT64_C(103)) {
SDValue N2 = N.getOperand(2);
SDValue N3 = N.getOp...
2014 Sep 24
1
[PATCH 1/2] allow path to envyas binary to be specified
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
src/shader/Makefile | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/src/shader/Makefile b/src/shader/Makefile
index 46658e9..2d789be 100644
--- a/src/shader/Makefile
+++ b/src/shader/Makefile
@@ -24,20 +24,21 @@ NVF0_SHADERS = xfrm2nvf0.vpc \
videonvf0.fpc
SHADERS = $(NVC0_SHADERS)
2015 Jan 09
3
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++
1 file changed, 109 insertions(+)
diff --git a/src/gallium/drivers/nouveau...
2013 Aug 05
2
[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?
On my target store/load of f32 or i32 are equivalents.
Previously I had duplicate instructions def in .td to map f32 and i32 to
the same opcode.
I deleted all that and I instead tried a new approach (to simplify things) :
setOperationAction(ISD::STORE, MVT::f32, Promote);
AddPromotedToType(ISD::STORE, MVT::f32, MVT::i32);...
2015 Jan 10
2
[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
V2: beat me, whip me, split out F64
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 ++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff...
2014 May 01
13
[Bug 78161] New: [NV96] Artifacts in output of fragment program containing not unrolled loops with conditional break
https://bugs.freedesktop.org/show_bug.cgi?id=78161
Priority: medium
Bug ID: 78161
Assignee: nouveau at lists.freedesktop.org
Summary: [NV96] Artifacts in output of fragment program
containing not unrolled loops with conditional break
Severity: normal
Classification: Unclassified
OS: Linux (All)
2013 Aug 05
0
[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?
On Mon, Aug 5, 2013 at 2:25 PM, Tom Stellard <tom at stellard.net> wrote:
> On Mon, Aug 05, 2013 at 02:09:58PM -0400, Francois Pichet wrote:
> > On my target store/load of f32 or i32 are equivalents.
> > Previously I had duplicate instructions def in .td to map f32 and i32 to
> > the same opcode.
> >
> > I deleted all that and I instead tried a new approach (to simplify
> things) :
> >
> > setOperationAction(ISD::STORE, MVT::f32,...
2013 Aug 05
1
[LLVMdev] Promote MVT::f32 load/store to MVT::i32 cause infinite loop in LegalizeDAG?
On Mon, Aug 05, 2013 at 02:09:58PM -0400, Francois Pichet wrote:
> On my target store/load of f32 or i32 are equivalents.
> Previously I had duplicate instructions def in .td to map f32 and i32 to
> the same opcode.
>
> I deleted all that and I instead tried a new approach (to simplify things) :
>
> setOperationAction(ISD::STORE, MVT::f32, Promote);
> AddPromotedToTyp...
2009 Apr 15
0
[LLVMdev] Error w/ Tablegen + Intrinsics
...trinsics file?
On Apr 14, 2009, at 6:34 PM, Villmow, Micah wrote:
> It seems that Tablegen is generating intrinsic ID’s off by in
> DAGISel.inc
>
> In DAGISel.inc, I have the following pattern:
> int64_t CN1 = Tmp0->getZExtValue();
>
> // Pattern: (intrinsic_w_chain:f32 103:iPTR, GPRF32:f32:$src0,
> GPRF32:f32:$src1, GPRF32:f32:$src2)
> // Emits: (MACRO_FMA_f32:f32 GPRF32:f32:$src0, GPRF32:f32:$src1,
> GPRF32:f32:$src2)
> // Pattern complexity = 8 cost = 1 size = 0
> if (CN1 == INT64_C(103)) {
> SDValue N2 = N.getOperand(2...
2016 Oct 02
2
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
...; the opts to a fixed point?
It is a second mov that causes a problem for later folding in the imm,
here output of a testshader[1]:
0: nop u32 %r56 (0)
1: ld u32 %r31 c0[0x0] (0)
2: ld u32 %r37 c0[0x140] (0)
3: mov u32 %r38 0x00000000 (0)
4: mov u32 %r39 0x3f800000 (0)
5: mad f32 %r40 %r37 %r38 %r39 (0)
6: mad f32 %r44 %r37 %r38 %r38 (0)
7: add f32 %r53 %r31 %r40 (0)
8: add f32 %r54 %r31 %r44 (0)
9: add f32 %r57 %r56 %r44 (0)
Constantfolding...
MAIN:-1 ()
BB:0 (14 instructions) - df = { }
-> BB:1 (tree)
0: nop u32 %r56 (0)
1: ld u32 %r31 c0[0x0] (0...
2018 Jun 21
2
NVPTX - Reordering load instructions
...;
> for (int i = 0; i < BLOCK_SIZE; i++) {
> for (int j = 0; j < i; j++)
> peri_col[idx][i] -= peri_col[idx][j] * dia[j][i];
> peri_col[idx][i] /= dia[i][i];
> }
NVCC emits PTX instructions where all loads from shared memory are
packed together:
> ...
> ld.shared.f32 %f546, [kernel_dia+440];
> ld.shared.f32 %f545, [%r4+-996];
> ld.shared.f32 %f544, [kernel_dia+56];
> ld.shared.f32 %f543, [kernel_dia+88];
> ld.shared.f32 %f542, [kernel_dia+500];
> ld.shared.f32 %f541, [kernel_dia+84];
> ld.shared.f32 %f540, [%r4+-972];
> ld.sha...
2014 Jul 05
1
[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32
No piglit regressions observed on nv50 and nvc0!
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
V2: fix usage of wrong variable
V3: enable F64 support
V4:
- disable F64 support again
- handle saturate flag...
2011 Sep 01
0
[PATCH 5/5] resample: Add NEON optimized inner_product_single for floating point
...nt16_t *a, const int16_t *b, unsigned int len)
@@ -97,4 +121,81 @@ static inline int32_t inner_product_single(const int16_t *a, const int16_t *b, u
return ret;
}
+#elif defined(FLOATING_POINT)
+
+static inline int32_t saturate_float_to_16bit(float a) {
+ int32_t ret;
+ asm ("vmov.f32 d0[0], %[a]\n"
+ "vcvt.s32.f32 d0, d0, #15\n"
+ "vqrshrn.s32 d0, q0, #15\n"
+ "vmov.s16 %[ret], d0[0]\n"
+ : [ret] "=&r" (ret)
+ : [a] "r" (a)
+ : "q0");
+ return ret;
+}
+#undef...
2009 Apr 15
2
[LLVMdev] Tablegen question
...gt; This is the intrinsic definition:
> def int_opencl_math_fdistance_fast : Intrinsic<[llvm_float_ty],
> [llvm_anyfloat_ty, LLVMMatchType<0>]>;
>
> The problem comes when I try to use the intrinsic. It gives me the
> following error:
> GPRV2F32:f32:$src1 MACRO_DISTANCE_FAST_v2f32: (set GPRF32:f32:$dst,
> (intrinsic_w_chain:f32 84:iPTR, GPRV2F32:v2f32:$src0,
> GPRV2F32:f32:$src1))
> TableGen.exe: In MACRO_DISTANCE_FAST_v2f32: Type inference
> contradiction
> found in node!
Your "$src1" is f32 instead of v2f32....
2008 Jul 07
5
[LLVMdev] fp_round libcall
Hi,
I'm trying to emit FP_ROUND f64 -> f32 considering a mips target that
only supports single
float point operations. The problem is that f32 is considered legal on this
target but f64 doesn't and the only way I can codegen this instruction is using
setConvertAction(MVT::f64, MVT::f32, Expand), which issues a EmitStackConvert.
What if...
2018 May 04
0
How to constraint instructions reordering from patterns?
...i32<64>
t5: ch = llvm.clp.set.rspb t3, TargetConstant:i16<393>, Constant:i32<64>
t8: ch = llvm.clp.set.rspsu t5, TargetConstant:i16<394>, Constant:i32<8>
t13: ch = store<Volatile ST4[@x1](tbaa=<0x3dbe418>)> t8, ConstantFP:f32<1.000000e+00>, GlobalAddress:i16<float* @x1> 0, undef:i16
t16: ch = store<Volatile ST4[@x2](tbaa=<0x3dbe418>)> t13, ConstantFP:f32<2.000000e+00>, GlobalAddress:i16<float* @x2> 0, undef:i16
t19: ch = store<Volatile ST4[@x3](tbaa=<0x3dbe418>)...
2015 Jan 11
0
[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
>
> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
> ---
> V2: beat me, whip me, split out F64
>
> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 ++++++++++++++++++++++
> 1 fil...
2015 Jan 11
2
[PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
...t;>>
>>>
>>> On 11.01.2015 22:54, Ilia Mirkin wrote:
>>>>
>>>> On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann
>>>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>>>
>>>>> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32},
>>>>> {S16/32})->F32
>>>>>
>>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
>>>>> ---
>>>>> V2: Split out F64 parts
>>>>> V3: rem...