Smith, Kevin B via llvm-dev
2021-Jul-12 21:28 UTC
[llvm-dev] [RFC] Should -ffast-math affect intrinsics?
I've got the following little program that illustrates what I think is a problem. This is for X86/Intel64 intrinsics. If compiled using $ clang -O2 intrin_prob.c $ a.out 2.000000, 3.000000 This is the expected result. But if compiled using $ clang -O2 -ffast-math intrin_prob.c $ a.out 1.500000, 3.255000 This gets incorrect results, because reassociation happens across the calls to the _mm_add_pd, and _mm_sub_pd intrinsics and the value that should have been added and subtracted gets constant folded to zero. It seems to me that the fast-math flags really should not affect intrinsics implementations themselves, and that the fast-math flags should allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me. I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least not for reassociation across the call boundaries and I haven't checked the Microsoft compiler yet. An easy "fix" would be to add #pragma float_control(precise, on) or #pragma clang fp reassociate(off) near the top of immintrin.h to cause all intrinsics to ignore all fast-math flags, or at least ignore reassociation. $ cat intrin_prob.c #include <immintrin.h> #include <stdio.h> static union { double u1[2]; __m128d u2; } t1[1] = {1.25, 3.25}; int main(int argc, char **argv) { __m128d t2; __m128d t3; // This is just so the compiler cannot constant fold // and know the values of t1. t1[0].u1[0] += argc * 0.25; t1[0].u1[1] += argc * .005; // This value when added, then subtracted should cause // the values to be truncated to integer. If the compiler // optimizes the add and subtract out by doing // reassociation, then the printed values will have // fractional parts. If the compiler does the intrinsics // as expected, then the values printed will have no fractional part. t2 = _mm_castsi128_pd(_mm_set_epi32((int)((0x4338000000000000uLL) >> 32), (int)((0x4338000000000000uLL) >> 0), (int)((0x4338000000000000uLL) >> 32), (int)((0x4338000000000000uLL) >> 0))); t3 = _mm_add_pd(t1[0].u2, t2); t3 = _mm_sub_pd(t3, t2); t1[0].u2 = t3; printf("%f, %f\n", t1[0].u1[0], t1[0].u1[1]); return 0; }
Smith, Kevin B via llvm-dev
2021-Jul-12 21:46 UTC
[llvm-dev] [RFC] Should -ffast-math affect intrinsics?
Sorry, missed a NOT or two. This is what I meant to say: It seems to me that the fast-math flags really should NOT affect intrinsics implementations themselves, and that the fast-math flags should NOT allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me.4 I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least it doesn't allow reassociation across the call boundaries and I haven't checked the Microsoft compiler yet. Kevin Smith -----Original Message----- From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Smith, Kevin B via llvm-dev Sent: Monday, July 12, 2021 2:28 PM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] [RFC] Should -ffast-math affect intrinsics? I've got the following little program that illustrates what I think is a problem. This is for X86/Intel64 intrinsics. If compiled using $ clang -O2 intrin_prob.c $ a.out 2.000000, 3.000000 This is the expected result. But if compiled using $ clang -O2 -ffast-math intrin_prob.c $ a.out 1.500000, 3.255000 This gets incorrect results, because reassociation happens across the calls to the _mm_add_pd, and _mm_sub_pd intrinsics and the value that should have been added and subtracted gets constant folded to zero. It seems to me that the fast-math flags really should not affect intrinsics implementations themselves, and that the fast-math flags should allow reassociation across the intrinsic calls. So, is this expected behavior, or just something that no-one has noticed before? It surprised me. I have also checked GCC behavior, which is consistent with clang, or vice versa. Intel C/C++ compiler does not have fast math flags affect intrinsics, at least not for reassociation across the call boundaries and I haven't checked the Microsoft compiler yet. An easy "fix" would be to add #pragma float_control(precise, on) or #pragma clang fp reassociate(off) near the top of immintrin.h to cause all intrinsics to ignore all fast-math flags, or at least ignore reassociation. $ cat intrin_prob.c #include <immintrin.h> #include <stdio.h> static union { double u1[2]; __m128d u2; } t1[1] = {1.25, 3.25}; int main(int argc, char **argv) { __m128d t2; __m128d t3; // This is just so the compiler cannot constant fold // and know the values of t1. t1[0].u1[0] += argc * 0.25; t1[0].u1[1] += argc * .005; // This value when added, then subtracted should cause // the values to be truncated to integer. If the compiler // optimizes the add and subtract out by doing // reassociation, then the printed values will have // fractional parts. If the compiler does the intrinsics // as expected, then the values printed will have no fractional part. t2 = _mm_castsi128_pd(_mm_set_epi32((int)((0x4338000000000000uLL) >> 32), (int)((0x4338000000000000uLL) >> 0), (int)((0x4338000000000000uLL) >> 32), (int)((0x4338000000000000uLL) >> 0))); t3 = _mm_add_pd(t1[0].u2, t2); t3 = _mm_sub_pd(t3, t2); t1[0].u2 = t3; printf("%f, %f\n", t1[0].u1[0], t1[0].u1[1]); return 0; } _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev