search for: fullfp16

Displaying 8 results from an estimated 8 matches for "fullfp16".

2018 Jan 18
1
[RFC] Half-Precision Support in the Arm Backends
Hi Sjoerd, For ISel, I think having a separate register class will give you less headache. I wondering if you could get away with not touching the instructions descriptions at all, instead defining external pattens for the FullFP16 case, like so: def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm), IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm", []>, Requires<[HasFP16]>, Sched<[WriteFPCVT]>; def : F...
2017 Dec 06
2
[RFC] Half-Precision Support in the Arm Backends
...would like to get completely rid of them). Cheers, Sjoerd. >On 12/4/2017 6:44 AM, Sjoerd Meijer via llvm-dev wrote: >> >> Custom Lowering >> ------------------------- >> >> Making f16 legal and not having native load/stores instructions available, >> (no FullFP16 support) means custom lowering loads/stores: >> 1) Since we don't have FP16 load/store instructions available, we create >> integer half-word loads. I unfortunately need the FP16_TO_FP node here, >> because that "models" creating an integer value, which is what...
2018 Jan 18
0
[RFC] Half-Precision Support in the Arm Backends
...ch rules for some Armv8.2-A FP16 instructions, e.g.: fsub, fadd, and also some conversion instructions. 2) Don't regress the "storage-only cases", i.e., when we only have the conversion instructions available. The Approach: ------------- 1) First, we make f16 legal only when we have FullFP16 support (i.e. when Armv8.2-A FP16 is supported), so in ARMISelLowering.cpp we add: if (Subtarget->hasFullFP16()) { addRegisterClass(MVT::f16, &ARM::HPRRegClass); } 2) This is the first implementation decision, I introduce a new register class HPR, which is an exact copy of the sing...
2017 Dec 04
2
[RFC] Half-Precision Support in the Arm Backends
...or _Float16 as a new source language type to Clang. _Float16 is a C11 extension type for which arithmetic is well defined, as opposed to e.g. __fp16 which is a storage-only type. I then fixed up the AArch64 backend, which was mostly straightforward: this involved making operations on f16 legal when FullFP16 is supported, thus avoiding promotions to f32. This enables generation of AArch64 FP16 instruction from C/C++. For AArch64, this work is finished and does not show problems in our testing; Solid Sands provided us with beta versions of their FP16 extension to SuperTest - their C/C++ language conform...
2019 Jul 12
2
[cfe-dev] ARM float16 intrinsic test
...ruct float16x4x4_t { float16x4_t val[4]; } float16x4x4_t; void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) { vst4_lane_f16(a, b, 3); } I tried: $$COMP_ROOT/clang -cc1 -triple thumbv7s-apple-darwin -target-abi apcs-gnu -target-cpu swift -fallow-half-arguments-and-returns -target-feature +fullfp16 -ffreestanding -disable-O0-optnone -emit-llvm -o arm.ll arm.cpp $cat arm.ll | grep llvm.arm call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* %4, <4 x half> %13, <4 x half> %14, <4 x half> %15, <4 x half> %16, i32 3, i32 2) declare void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8...
2019 Jul 12
2
[cfe-dev] ARM float16 intrinsic test
...rogram arguments: /home/nancy/rpp_llvm/build-project/bin/clang-8 -cc1 -triple armv8.2a-arm-unknown-eabihf -S -disable-free -main-file-name arm.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -mconstructor-aliases -nostdsysteminc -target-cpu generic -target-feature +fullfp16 -target-feature +strict-align -target-abi aapcs -mfloat-abi hard -fallow-half-arguments-and-returns -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/nancy/rpp_llvm/test/-.gcno -resource-dir /home/nancy/rpp_llvm/build-project/lib/clang/8.0.0 -internal-isystem /home/nancy/rpp_llvm/b...
2020 Jan 23
3
How to find out the default CPU / Features String for a given triple?
...x15,-call-saved-x18,-call-saved-x8,-call-saved-x9,+ccdp,+ccidx,+ccpp,+complxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-reserve-x10,-reserv...
2017 Dec 04
2
[RFC] - Deduplication of debug information in linkers (LLD)
...t;windows-1252"; Format="flowed" > > On 12/4/2017 6:44 AM, Sjoerd Meijer via llvm-dev wrote: > > > > Custom Lowering > > ------------------------- > > > > Making f16 legal and not having native load/stores instructions > available, > > (no FullFP16 support) means custom lowering loads/stores: > > 1) Since we don't have FP16 load/store instructions available, we create > > integer half-word loads. I unfortunately need the FP16_TO_FP node > here, > > because that "models" creating an integer value, which...