Displaying 2 results from an estimated 2 matches for "vfp32".
Did you mean:
fp32
2019 Sep 05
2
ARM vectorized fp16 support
...erate fused-multiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)
Test programs and outputs,
$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
test_vfma_lane_f16: // @test_vfma_lane_f16
fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD
mov v0.16b, v2.16b
ret
$ cat vfp32.c
#include <arm_neon.h>
float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float...
2019 Sep 05
2
ARM vectorized fp16 support
...ions
> for c += a * b. I'm wondering whether I did something wrong, if not,
> is it a missing feature that will be supported later? (I know there're
> fp16 FMLA intrinsics though)
>
> Test programs and outputs,
>
> $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
> test_vfma_lane_f16: // @test_vfma_lane_f16
> fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD
> mov v0.16b, v2.16b
> ret
> $ cat vfp32.c
> #include <arm_neon.h>
> float32x4_t test_vfma_lane_f16(...