thr3ads.net - similar to: "

Displaying 20 results from an estimated 7000 matches similar to: "_Float16 support"

2019 Jan 24

[cfe-dev] _Float16 support

It seems that there are several issues here: 1. Should the front end be concerned with whether or not the IR that it is emitting can be translated into a well-defined IR? 2. How should the selection DAG handle data types whose representation isn't defined by the ABI we're targeting? 3. What should the ABI do with half-precision floats? Working backward... The third question here is

[cfe-dev] _Float16 support

2019 Jan 24

[cfe-dev] _Float16 support

On 24 Jan 2019, at 4:46, Sjoerd Meijer wrote: > Hello, > > I added _Float16 support to Clang and codegen support in the AArch64 > and ARM backends, but have not looked into x86. Ahmed is right: > AArch64 is fine, only a few ACLE intrinsics are missing. ARM has rough > edges: scalar codegen should be mostly fine, vector codegen needs some > more work. > >

[RFC] Half-Precision Support in the Arm Backends

2017 Dec 06

[RFC] Half-Precision Support in the Arm Backends

Thanks a lot for the suggestions! I will look into using vld1/vst1, sounds good. I am custom lowering the bitcasts, that's now the only place where FP_TO_FP16 and FP16_TO_FP nodes are created to avoid inefficient code generation. I will double check if I can't achieve the same without using these nodes (because I really would like to get completely rid of them). Cheers, Sjoerd.

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

[RFC] Half-Precision Support in the Arm Backends

I would like to revive this thread, as I am struggling a lot with the FP16 implementation in the ARM backend. My implementation in https://reviews.llvm.org/D38315 is finished (except one case), but a more robust alternative implementation was suggested. One can indeed argue that my current implementation is a bit fragile, because it involves manually patching up the isel dags for a few cases. The

[RFC] Half-Precision Support in the Arm Backends

2018 Jan 18

[RFC] Half-Precision Support in the Arm Backends

Hi Sjoerd, For ISel, I think having a separate register class will give you less headache. I wondering if you could get away with not touching the instructions descriptions at all, instead defining external pattens for the FullFP16 case, like so: def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm), IIC_fpCVTSH, "vcvtb",

[RFC] Half-Precision Support in the Arm Backends

2017 Dec 04

[RFC] Half-Precision Support in the Arm Backends

Hi, I am working on C/C++ language support for the Armv8.2-A half-precision instructions. I've added support for _Float16 as a new source language type to Clang. _Float16 is a C11 extension type for which arithmetic is well defined, as opposed to e.g. __fp16 which is a storage-only type. I then fixed up the AArch64 backend, which was mostly straightforward: this involved making operations

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

Thanks Eli. I forgot to bring up the strict FP questions which I was working on when I found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp emitted by promoting load/store/bitcast need to be strict versions of fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for the bitcast case which isn't a chained node. ~Craig On Tue, Dec 10, 2019 at 3:18 PM

Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

2015 Sep 08

Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics

Hi, I was looking at the x86 vector intrinsics for converting half precision floating point numbers and I'm a bit confused as to why certain types were chosen. I've gone ahead and used their current definition with success but I'd like to understand why the types used with these intrinsics are done this way. For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

TypePromoteFloat loses intermediate rounding operations

For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg

[LLVMdev] how to do make a FP_ROUND need/operattion

2015 May 12

[LLVMdev] how to do make a FP_ROUND need/operattion

Hi Guys, I and trying to covert a float to a f16. calling DAG.getNode(ISD::FP_ROUND, DL, Op->getValueType(0), FloatNode); will get the error message:"Invalid method to make FP_ROUND node" what is the "right" way to make this work? best Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] __fp16 suport in llvm back-end

2014 Jun 19

[LLVMdev] __fp16 suport in llvm back-end

Hi, all: I am trying to test half float point support in llvm, I found clang can generate bitcode for __fp16, while llc can't generate code for it, the error message is like this LLVM ERROR: Cannot select: 0x26a68e0: i16 = fp32_to_fp16 0x26a67d8 [ORD=2] [ID=4] 0x26a67d8: f32,ch = CopyFromReg 0x2693060, 0x26a66d0 [ORD=2] [ID=3] 0x26a66d0: f32 = Register %vreg1 [ID=1] In function: test

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Jul 31

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Tim, Thanks for the thorough explanation. It makes perfect sense. I was not aware fast-math is supposed to prevent more precision being used than what is in the standard. I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off.

Changes to TableGen in v4.0?

2017 Jan 23

Changes to TableGen in v4.0?

I am trying to upgrade to the LLVM v4.0 branch, but I am seeing failures in my TableGen descriptions for conversion from FP32 to FP16 (scalar and vector). The patterns I have are along the lines of: [(set (f16 RF16:$dst), (fround (f32 RF32:$src)))] or: [(set (v2f16 VF16:$dst), (fround (v2f32 VF32:$src)))] and these now produce the errors: error: In CONV_f32_f16: Type inference

[LLVMdev] Is it a bug or am I missing something ?

2013 Feb 19

[LLVMdev] Is it a bug or am I missing something ?

Hi all, on following code: ; ModuleID = 'shufxbug.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32" target triple = "i386-pc-linux-gnu" define void @sample_test(<4 x float>* nocapture %source, <8 x float>* nocapture %dest) nounwind noinline { L.entry:

[LLVMdev] Seg faulting on vector ops

2007 Jul 20

[LLVMdev] Seg faulting on vector ops

Hola LLVMers, I'm looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I've started experimenting with it. What is the state of the machine code generated for vectors? In my tinkering, I seem to be getting some wonky machine instructions, but I'm most likely just doing something wrong and I'm hoping you can set me in

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 01

RFC: A proposal for vectorizing loops with calls to math functions using SVML

RFC: A proposal for vectorizing loops with calls to math functions using SVML (short vector math library). ========= Overview ========= Very simply, SVML (Intel short vector math library) functions are vector variants of scalar math functions that take vector arguments, apply an operation to each element, and store the result in a vector register. These vector variants can be generated by the

Should llvm optimize 1.0 / x ?

2020 Aug 31

Should llvm optimize 1.0 / x ?

Hi, Here is a small C++ program: vec.cc: #include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { return 1.0 / x; } v4f32 fct2(v4f32 x) { return __builtin_ia32_rcpps(x); } Which is compiled to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0d 00 00 vbroadcastss

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >

RFC: A proposal for vectorizing loops with calls to math functions using SVML

2016 Apr 04

RFC: A proposal for vectorizing loops with calls to math functions using SVML

Hi Sanjay, For sincos calls, I’m currently just going through isTriviallyVectorizable(), which was good enough to get things working so that I could test the translation. I don’t see why this cannot be changed to use addVectorizableFunctionsFromVecLib(). The other functions that I’m working with are already vectorized using the loop pragma. Those include sin, cos, exp, log, and pow. From: Sanjay

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

Hello, Depending on how I extract integer lanes from an x86_64 xmm register, the backend may spill that register in order to load scalars. The effect was observed on two targets: corei7-avx and btver1 (I haven't checked other targets). Here's a test case with spilling/no-spilling code put on conditional compile: #if __SSE4_1__ != 0 #include <smmintrin.h> #else #include

similar to: _Float16 support