similar to: [RFC] Half-Precision Support in the Arm Backends

Displaying 20 results from an estimated 1000 matches similar to: "[RFC] Half-Precision Support in the Arm Backends"

2018 Jan 18
0
[RFC] Half-Precision Support in the Arm Backends
I would like to revive this thread, as I am struggling a lot with the FP16 implementation in the ARM backend. My implementation in https://reviews.llvm.org/D38315 is finished (except one case), but a more robust alternative implementation was suggested. One can indeed argue that my current implementation is a bit fragile, because it involves manually patching up the isel dags for a few cases. The
2018 Jan 18
1
[RFC] Half-Precision Support in the Arm Backends
Hi Sjoerd, For ISel, I think having a separate register class will give you less headache. I wondering if you could get away with not touching the instructions descriptions at all, instead defining external pattens for the FullFP16 case, like so: def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm), IIC_fpCVTSH, "vcvtb",
2017 Dec 06
2
[RFC] Half-Precision Support in the Arm Backends
Thanks a lot for the suggestions! I will look into using vld1/vst1, sounds good. I am custom lowering the bitcasts, that's now the only place where FP_TO_FP16 and FP16_TO_FP nodes are created to avoid inefficient code generation. I will double check if I can't achieve the same without using these nodes (because I really would like to get completely rid of them). Cheers, Sjoerd.
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
Thanks Eli. I forgot to bring up the strict FP questions which I was working on when I found this. If we're in a strict FP function, do the fp_to_f16/f16_to_fp emitted by promoting load/store/bitcast need to be strict versions of fp_to_f16/f16_to_fp. And if so where do we get the chain, especially for the bitcast case which isn't a chained node. ~Craig On Tue, Dec 10, 2019 at 3:18 PM
2019 Dec 10
2
TypePromoteFloat loses intermediate rounding operations
For the following C code __fp16 x, y, z, w; void foo() { x = y + z; x = x + w; } clang produces IR that extends each operand to float and then truncates to half before assigning to x. Like this define dso_local void @foo() #0 !dbg !18 { %1 = load half, half* @y, align 2, !dbg !21 %2 = fpext half %1 to float, !dbg !21 %3 = load half, half* @z, align 2, !dbg !22 %4 = fpext half %3 to float, !dbg
2014 Jul 14
5
[LLVMdev] RFC: Do we still need @llvm.convert.to.fp16 and the reverse?
Hi all, What do people think of doing away with the @llvm.convert.to.fp16 and @llvm.convert.from.fp16 intrinsics, in favour of using "half" and fpext/fptrunc? [1] It looks like those intrinsics originally date from before "half" actually existed in LLVM, and of course the backends have grown up assuming that's what Clang will produce, so we'd have to improve their
2014 Jul 14
2
[LLVMdev] RFC: Do we still need @llvm.convert.to.fp16 and the reverse?
On Jul 14, 2014, at 7:23 AM, Tom Stellard <tom at stellard.net> wrote: > On Mon, Jul 14, 2014 at 01:08:54PM +0100, Tim Northover wrote: >> Hi all, >> >> What do people think of doing away with the @llvm.convert.to.fp16 and >> @llvm.convert.from.fp16 intrinsics, in favour of using "half" and >> fpext/fptrunc? [1] >> > > I am in favor
2012 Nov 02
2
[LLVMdev] Half Float fp16 Native Support
hi all, i am trying to implement native support for fp16 in llvm-3.1 i have already used the opencl patch for clang so the IR that is generated is correct. i tried to add some code so the the fp16 type is handled correctly but no luck. We have a target that has native fp16 units and tried to run a simple program int main () { __fp16 a,b,c,d; a= 1.1; b=2.2; c=3.3;
2019 Jan 22
4
_Float16 support
I'd like to start a discussion about how clang supports _Float16 for target architectures that don't have direct support for 16-bit floating point arithmetic. The current clang language extensions documentation says, "If half-precision instructions are unavailable, values will be promoted to single-precision, similar to the semantics of __fp16 except that the results will be stored
2017 Jan 23
2
Changes to TableGen in v4.0?
I am trying to upgrade to the LLVM v4.0 branch, but I am seeing failures in my TableGen descriptions for conversion from FP32 to FP16 (scalar and vector). The patterns I have are along the lines of: [(set (f16 RF16:$dst), (fround (f32 RF32:$src)))] or: [(set (v2f16 VF16:$dst), (fround (v2f32 VF32:$src)))] and these now produce the errors: error: In CONV_f32_f16: Type inference
2014 Jul 25
3
[LLVMdev] FPU cannot be compatible with -soft-float code on mips by llc
Hi all, -soft-float can not be rightly use by llc. All float function operation will call soft float, but not hard. My mips device cannot support half float type, so I hack the llvm, and add soft half float and add -soft-float option. I add the function define for __gnu_f2h_ieee() and __gnu_h2f_ieee (), and it can call the soft half float. However, all the others function about
2019 Jan 24
4
[cfe-dev] _Float16 support
On 24 Jan 2019, at 4:46, Sjoerd Meijer wrote: > Hello, > > I added _Float16 support to Clang and codegen support in the AArch64 > and ARM backends, but have not looked into x86. Ahmed is right: > AArch64 is fine, only a few ACLE intrinsics are missing. ARM has rough > edges: scalar codegen should be mostly fine, vector codegen needs some > more work. > >
2014 Jul 09
2
[LLVMdev] Help!!!!Help!!!! " LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] " problem!!!!!!!!!!!!!!!!!!
Hi all,�� � I am new to llvm. I need help. Thank you every! � � I want to realize vcvtt.f16.f32 �NEON instruction by llvm. This instruction covert top-16bits of a single type to f16. I use the intrinsics function llvm.convert.to.fp16, but cannot llc ,�I meet is following problem : LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] 0x9fc0750: f32,ch = load 0x3aafd68,
2019 Jan 24
2
[cfe-dev] _Float16 support
It seems that there are several issues here: 1. Should the front end be concerned with whether or not the IR that it is emitting can be translated into a well-defined IR? 2. How should the selection DAG handle data types whose representation isn't defined by the ABI we're targeting? 3. What should the ABI do with half-precision floats? Working backward... The third question here is
2014 Jul 10
2
[LLVMdev] Help!!!!Help!!!! " LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] " problem!!!!!!!!!!!!!!!!!!
Hi Daniel,    Thank you your replying.     Yes, the problem is about MIPS backend. You give me this message "There is limited support for the <8 x f16> type when MSA (MIPS SIMD Architecture) is enabled but even then scalar half-precision is not currently supported."  Could you give me some official link or some evidence? Thank you very much. Robin yalong at multicorewareinc.com
2019 Sep 05
2
ARM vectorized fp16 support
Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let you know if it works. On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote: > > Hi, > Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining
2014 Jul 09
6
[LLVMdev] Help!!!!Help!!!! " LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] " problem!!!!!!!!!!!!!!!!!!
    Thank you Kevin!!!    If I use fptrunc and bitcast realise NEON vcvtt ( I can sure, "fptrunc  double %tmp to float" is right, but "fptrunc float %tmp to half" is wrong). My target platform is MIPS.  The command as following: NEON:            vcvtt.f16.f32 s2, s0 llvm Code: %Vt_2 = load float* %VFP_s0, align 4 %Vt3_1 = fptrunc float %Vt_2 to half %Vt4_1 = bitcast half
2017 Dec 04
2
[RFC] - Deduplication of debug information in linkers (LLD)
At least one proprietary linker put a lot of effort into deduplicating and rewriting debug information. This took up the majority of the link time despite serious engineering time on performance optimisation. For example, some sections were written from scratch by the linker because that proved faster than parsing the input. Teaching LLD to dedup DWARF should be expected to dramatically slow it
2014 Jul 09
4
[LLVMdev] Help!!!!Help!!!! " LLVM ERROR: Cannot select: 0x9fc9680: i32 = fp32_to_fp16 0x9fc0750 [ID=16] " problem!!!!!!!!!!!!!!!!!!
On 07/09/2014 12:41 PM, Matt Arsenault wrote: > On 07/09/2014 03:30 PM, yalong at multicorewareinc.com wrote: >> Thank you Kevin!!! >> If I use fptrunc and bitcast realise NEON vcvtt ( I can sure, >> "fptrunc double %tmp to float" is right, but "fptrunc float %tmp to >> half" is wrong). My target platform is MIPS. The command as following:
2015 Sep 08
2
Strange types on x86 vcvtph2ps and vcvtps2ph intrinsics
Hi, I was looking at the x86 vector intrinsics for converting half precision floating point numbers and I'm a bit confused as to why certain types were chosen. I've gone ahead and used their current definition with success but I'd like to understand why the types used with these intrinsics are done this way. For reference see ``include/llvm/IR/IntrinsicsX86.td``. Here are the