Hi, I'm trying to compile half precision program for ARM, while it seems LLVM fails to automatically generate fused-multiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD mov v0.16b, v2.16b ret $ cat vfp32.c #include <arm_neon.h> float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) { c += a * b; return c; } $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmul v0.4h, v1.4h, v0.4h fadd v0.4h, v0.4h, v2.4h // fp16 does NOT use FMLA ret $ cat vfp16.c #include <arm_neon.h> float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) { c += a * b; return c; } -- Yizhi Liu
Hi, Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you. Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Yizhi Liu via llvm-dev <llvm-dev at lists.llvm.org> Sent: 05 September 2019 06:52 To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: [llvm-dev] ARM vectorized fp16 support Hi, I'm trying to compile half precision program for ARM, while it seems LLVM fails to automatically generate fused-multiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD mov v0.16b, v2.16b ret $ cat vfp32.c #include <arm_neon.h> float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) { c += a * b; return c; } $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmul v0.4h, v1.4h, v0.4h fadd v0.4h, v0.4h, v2.4h // fp16 does NOT use FMLA ret $ cat vfp16.c #include <arm_neon.h> float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) { c += a * b; return c; } -- Yizhi Liu _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190905/1aa0235f/attachment.html>
Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let you know if it works. On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:> > Hi, > Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you. > > Cheers, > Sjoerd. > ________________________________ > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Yizhi Liu via llvm-dev <llvm-dev at lists.llvm.org> > Sent: 05 September 2019 06:52 > To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> > Subject: [llvm-dev] ARM vectorized fp16 support > > Hi, > > I'm trying to compile half precision program for ARM, while it seems > LLVM fails to automatically generate fused-multiply-add instructions > for c += a * b. I'm wondering whether I did something wrong, if not, > is it a missing feature that will be supported later? (I know there're > fp16 FMLA intrinsics though) > > Test programs and outputs, > > $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c > test_vfma_lane_f16: // @test_vfma_lane_f16 > fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD > mov v0.16b, v2.16b > ret > $ cat vfp32.c > #include <arm_neon.h> > float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) { > c += a * b; > return c; > } > > $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c > test_vfma_lane_f16: // @test_vfma_lane_f16 > fmul v0.4h, v1.4h, v0.4h > fadd v0.4h, v0.4h, v2.4h // fp16 does NOT use FMLA > ret > $ cat vfp16.c > #include <arm_neon.h> > float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) { > c += a * b; > return c; > } > > -- > Yizhi Liu > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.-- Yizhi Liu