search for: fp16fml

Displaying 3 results from an estimated 3 matches for "fp16fml".

2019 Sep 05
2
ARM vectorized fp16 support
...fails to automatically generate fused-multiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD mov v0.16b, v2.16b ret $ cat vfp32.c #include <arm_neon.h> float32x4_t test_vfma_lane_f16(float32x4_...
2019 Sep 05
2
ARM vectorized fp16 support
...fused-multiply-add instructions > for c += a * b. I'm wondering whether I did something wrong, if not, > is it a missing feature that will be supported later? (I know there're > fp16 FMLA intrinsics though) > > Test programs and outputs, > > $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c > test_vfma_lane_f16: // @test_vfma_lane_f16 > fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD > mov v0.16b, v2.16b > ret > $ cat vfp32.c > #include <arm_neon.h> > float3...
2020 Jan 23
3
How to find out the default CPU / Features String for a given triple?
...d-x14,-call-saved-x15,-call-saved-x18,-call-saved-x8,-call-saved-x9,+ccdp,+ccidx,+ccpp,+complxnum,+crc,-crypto,-custom-cheap-as-move,-cyclone,-disable-latency-sched-heuristic,+dit,+dotprod,-exynos-cheap-as-move,-exynosm1,-exynosm2,-exynosm3,-exynosm4,-falkor,+fmi,-force-32bit-jump-tables,+fp-armv8,-fp16fml,+fptoint,-fullfp16,-fuse-address,+fuse-aes,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,+jsconv,-kryo,+lor,+lse,-lsl-fast,+mpam,-mte,+neon,-no-neg-immediates,+nv,+pa,+pan,+pan-rwv,+perfmon,-predictable-select-expensive,+predres,-rand,+ras,+rasv8_4,+rcpc,+rcpc-immo,+rdm,-reserve-x1,-...