Displaying 8 results from an estimated 8 matches for "fullfp16".
2018 Jan 18
1
[RFC] Half-Precision Support in the Arm Backends
Hi Sjoerd,
For ISel, I think having a separate register class will give you less headache. I wondering if you could get away with not touching the instructions descriptions at all, instead defining external pattens for the FullFP16 case, like so:
def VCVTBHS: ASuI<0b11101, 0b11, 0b0010, 0b01, 0, (outs SPR:$Sd), (ins SPR:$Sm),
IIC_fpCVTSH, "vcvtb", ".f32.f16\t$Sd, $Sm",
[]>,
Requires<[HasFP16]>,
Sched<[WriteFPCVT]>;
def : F...
2017 Dec 06
2
[RFC] Half-Precision Support in the Arm Backends
...would like to get completely rid of them).
Cheers,
Sjoerd.
>On 12/4/2017 6:44 AM, Sjoerd Meijer via llvm-dev wrote:
>>
>> Custom Lowering
>> -------------------------
>>
>> Making f16 legal and not having native load/stores instructions available,
>> (no FullFP16 support) means custom lowering loads/stores:
>> 1) Since we don't have FP16 load/store instructions available, we create
>> integer half-word loads. I unfortunately need the FP16_TO_FP node here,
>> because that "models" creating an integer value, which is what...
2018 Jan 18
0
[RFC] Half-Precision Support in the Arm Backends
...ch rules for some Armv8.2-A FP16 instructions, e.g.: fsub, fadd,
and also some conversion instructions.
2) Don't regress the "storage-only cases", i.e., when we only have the
conversion instructions available.
The Approach:
-------------
1) First, we make f16 legal only when we have FullFP16 support (i.e. when
Armv8.2-A FP16 is supported), so in ARMISelLowering.cpp we add:
if (Subtarget->hasFullFP16()) {
addRegisterClass(MVT::f16, &ARM::HPRRegClass);
}
2) This is the first implementation decision, I introduce a new register class
HPR, which is an exact copy of the sing...
2017 Dec 04
2
[RFC] Half-Precision Support in the Arm Backends
...or _Float16 as a new source language type to
Clang. _Float16 is a C11 extension type for which arithmetic is well defined, as
opposed to e.g. __fp16 which is a storage-only type. I then fixed up the
AArch64 backend, which was mostly straightforward: this involved making
operations on f16 legal when FullFP16 is supported, thus avoiding promotions to
f32. This enables generation of AArch64 FP16 instruction from C/C++. For
AArch64, this work is finished and does not show problems in our testing; Solid
Sands provided us with beta versions of their FP16 extension to SuperTest -
their C/C++ language conform...
2019 Jul 12
2
[cfe-dev] ARM float16 intrinsic test
...ruct float16x4x4_t {
float16x4_t val[4];
} float16x4x4_t;
void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) {
vst4_lane_f16(a, b, 3);
}
I tried:
$$COMP_ROOT/clang -cc1 -triple thumbv7s-apple-darwin -target-abi apcs-gnu
-target-cpu swift -fallow-half-arguments-and-returns -target-feature
+fullfp16 -ffreestanding -disable-O0-optnone -emit-llvm -o arm.ll arm.cpp
$cat arm.ll | grep llvm.arm
call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* %4, <4 x half> %13, <4 x
half> %14, <4 x half> %15, <4 x half> %16, i32 3, i32 2)
declare void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8...
2019 Jul 12
2
[cfe-dev] ARM float16 intrinsic test
...rogram arguments:
/home/nancy/rpp_llvm/build-project/bin/clang-8 -cc1 -triple
armv8.2a-arm-unknown-eabihf -S -disable-free -main-file-name arm.cpp
-mrelocation-model static -mthread-model posix -mdisable-fp-elim
-fmath-errno -mconstructor-aliases -nostdsysteminc -target-cpu generic
-target-feature +fullfp16 -target-feature +strict-align -target-abi
aapcs -mfloat-abi hard -fallow-half-arguments-and-returns
-dwarf-column-info -debugger-tuning=gdb -coverage-notes-file
/home/nancy/rpp_llvm/test/-.gcno -resource-dir
/home/nancy/rpp_llvm/build-project/lib/clang/8.0.0 -internal-isystem
/home/nancy/rpp_llvm/b...
2020 Jan 23
3
How to find out the default CPU / Features String for a given triple?
...x15,-call-saved-x18,-call-saved-x8,-call-saved-x9,+ccdp,+ccidx,+ccpp,+complxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-reserve-x10,-reserv...
2017 Dec 04
2
[RFC] - Deduplication of debug information in linkers (LLD)
...t;windows-1252"; Format="flowed"
>
> On 12/4/2017 6:44 AM, Sjoerd Meijer via llvm-dev wrote:
> >
> > Custom Lowering
> > -------------------------
> >
> > Making f16 legal and not having native load/stores instructions
> available,
> > (no FullFP16 support) means custom lowering loads/stores:
> > 1) Since we don't have FP16 load/store instructions available, we create
> > integer half-word loads. I unfortunately need the FP16_TO_FP node
> here,
> > because that "models" creating an integer value, which...