Displaying 1 result from an estimated 1 matches for "1aa0235f".
2019 Sep 05
2
ARM vectorized fp16 support
Hi,
I'm trying to compile half precision program for ARM, while it seems
LLVM fails to automatically generate fused-multiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)
Test programs and outputs,
$ clang -O3 -march=armv8.2-a+fp16fml