Blumenthal, Uri - 0553 - MITLL via llvm-dev
2021-Oct-25 21:35 UTC
[llvm-dev] Problem with clang optimizer?
I just tried Clang-13 (with LLVM-13), and the problem is still there. Vectorizer still broken wrt. SSE-4.1 instruction extensions: $ echo $CXXFLAGS -std=gnu++17 -O3 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk $ clang++-mp-13 $CXXFLAGS -o t sha3-reproducer.cxx $ ./t Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. Abort trap: 6 $ clang++-mp-13 $CXXFLAGS -mno-sse4.1 -o t sha3-reproducer.cxx $ ./t $ -- Regards, Uri There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. The other is to make it so complex there are no obvious deficiencies. - C. A. R. Hoare From: Jameson Nash <vtjnash at gmail.com> Date: Wednesday, September 29, 2021 at 19:41 To: Craig Topper <craig.topper at gmail.com> Cc: Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Problem with clang optimizer? This may be fixed now (https://reviews.llvm.org/D106613), but it remains to be confirmed for https://bugs.llvm.org/show_bug.cgi?id=51957 On Sun, Sep 26, 2021 at 1:12 AM Craig Topper via llvm-dev <llvm-dev at lists.llvm.org> wrote: Looking at the IR here https://godbolt.org/z/zaMW1renW I believe the issue is on this instruction on line 361 %30 = extractelement <2 x <2 x i64>*> %bc438, i32 0 It should be extracting from index 1 instead of index 0. ~Craig On Sat, Sep 25, 2021 at 5:48 PM Blumenthal, Uri - 0553 - MITLL <uri at ll.mit.edu> wrote: I found that · The problem disappears with -mno-sse4.1 · The problem manifests with both Apple Clang from Xcode-13, and LLVM Clang-12 (and not with Xcode-12 or LLVM Clang-11) · I could experiment only on Apple platform, as that’s the only one I have that runs LLVM Clang-12. -- Regards, Uri There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. The other is to make it so complex there are no obvious deficiencies. - C. A. R. Hoare From: Craig Topper <craig.topper at gmail.com> Date: Saturday, September 25, 2021 at 12:07 To: Dimitry Andric <dimitry at andric.com> Cc: Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Problem with clang optimizer? It reproduced for me with -march=nehalem which does not have AVX. On Sat, Sep 25, 2021 at 2:51 AM Dimitry Andric via llvm-dev <llvm-dev at lists.llvm.org> wrote: It is only occurring (as far as I can see now) on x86_64, with -mavx enabled. Or with a target CPU that supports AVX. And it is not Apple clang specific. -Dimitry On 24 Sep 2021, at 15:30, Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org> wrote: I tried to reproduce it on goldbolt with clang 12.0.0 and 12.0.1 but things seem fine when I run it there: https://godbolt.org/z/vrq8j6Kj7. Can you share your exact clang invocation? Does it only reproduce in some specific environment? Save the source I posted before into “sha3-reproducer.cxx” file. Let me know if you want it re-posted here. $ clang++-mp-12 -v clang version 12.0.1 Target: x86_64-apple-darwin20.6.0 Thread model: posix InstalledDir: /opt/local/libexec/llvm-12/bin $ clang++-mp-12 -o s -O3 sha3-reproducer.cxx $ ./s Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. Abort trap: 6 $ clang++-mp-12 -o s -O2 sha3-reproducer.cxx $ ./s Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. Abort trap: 6 $ clang++-mp-12 -o s -O1 sha3-reproducer.cxx $ ./s $ Clang-12 is installed via Macports, which is why we invoke the executable as clang++-mp-12. The same problem manifests in exactly the same way in the Xcode-13 version of Clang (presumably based on LLVM Clang-12). I’ll be happy to provide more of specific details, if you let me know what you need. Also, it generally helps to reduce code bug reports as much as possible; creduce can help with that: https://embed.cs.utah.edu/creduce/using/. Understood. Unfortunately, the above reproducer is the best we could come up with. An alternative is trying to build the Botan package itself https://github.com/randombit/botan.git. On Thu, Sep 23, 2021 at 10:14 PM Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org> wrote: I’m not sure if this is the correct list, so please direct me to the right one if this bug report shouldn’t go here. The problem is: invoking clang (v12) with -O2 or better optimization flags generates wrong object code for the following C++. Compiling it with -O1 generates working binary. ================ #include <cstdint> #include <cassert> template<size_t ROT, typename T> inline constexpr T rotl(T input) { static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation constant"); return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT))); } inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC) { const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20]; const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21]; // the calculation of C2 fails for -O3 or -O2 with clang 12 // FWIW: it would produce a value that doesn't fit into a _signed_ 64-bit int const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22]; const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23]; const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24]; const uint64_t D0 = rotl<1>(C0) ^ C3; const uint64_t D1 = rotl<1>(C1) ^ C4; const uint64_t D2 = rotl<1>(C2) ^ C0; const uint64_t D3 = rotl<1>(C3) ^ C1; const uint64_t D4 = rotl<1>(C4) ^ C2; const uint64_t B00 = A[ 0] ^ D1; const uint64_t B01 = rotl<44>(A[ 6] ^ D2); const uint64_t B02 = rotl<43>(A[12] ^ D3); const uint64_t B03 = rotl<21>(A[18] ^ D4); const uint64_t B04 = rotl<14>(A[24] ^ D0); T[ 0] = B00 ^ (~B01 & B02) ^ RC; T[ 1] = B01 ^ (~B02 & B03); T[ 2] = B02 ^ (~B03 & B04); T[ 3] = B03 ^ (~B04 & B00); T[ 4] = B04 ^ (~B00 & B01); const uint64_t B05 = rotl<28>(A[ 3] ^ D4); const uint64_t B06 = rotl<20>(A[ 9] ^ D0); const uint64_t B07 = rotl< 3>(A[10] ^ D1); const uint64_t B08 = rotl<45>(A[16] ^ D2); const uint64_t B09 = rotl<61>(A[22] ^ D3); T[ 5] = B05 ^ (~B06 & B07); T[ 6] = B06 ^ (~B07 & B08); T[ 7] = B07 ^ (~B08 & B09); T[ 8] = B08 ^ (~B09 & B05); T[ 9] = B09 ^ (~B05 & B06); // --- instructions starting from here can be removed // and the -O3 dicrepancy is still triggered const uint64_t B10 = rotl< 1>(A[ 1] ^ D2); const uint64_t B11 = rotl< 6>(A[ 7] ^ D3); const uint64_t B12 = rotl<25>(A[13] ^ D4); const uint64_t B13 = rotl< 8>(A[19] ^ D0); const uint64_t B14 = rotl<18>(A[20] ^ D1); T[10] = B10 ^ (~B11 & B12); T[11] = B11 ^ (~B12 & B13); T[12] = B12 ^ (~B13 & B14); T[13] = B13 ^ (~B14 & B10); T[14] = B14 ^ (~B10 & B11); const uint64_t B15 = rotl<27>(A[ 4] ^ D0); const uint64_t B16 = rotl<36>(A[ 5] ^ D1); const uint64_t B17 = rotl<10>(A[11] ^ D2); const uint64_t B18 = rotl<15>(A[17] ^ D3); const uint64_t B19 = rotl<56>(A[23] ^ D4); T[15] = B15 ^ (~B16 & B17); T[16] = B16 ^ (~B17 & B18); T[17] = B17 ^ (~B18 & B19); T[18] = B18 ^ (~B19 & B15); T[19] = B19 ^ (~B15 & B16); const uint64_t B20 = rotl<62>(A[ 2] ^ D3); const uint64_t B21 = rotl<55>(A[ 8] ^ D4); const uint64_t B22 = rotl<39>(A[14] ^ D0); const uint64_t B23 = rotl<41>(A[15] ^ D1); const uint64_t B24 = rotl< 2>(A[21] ^ D2); T[20] = B20 ^ (~B21 & B22); T[21] = B21 ^ (~B22 & B23); T[22] = B22 ^ (~B23 & B24); T[23] = B23 ^ (~B24 & B20); T[24] = B24 ^ (~B20 & B21); } int main() { uint64_t T[25]; uint64_t A[25] = { 15515230172486u, 9751542238472685244u, 220181482233372672u, 2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u, 2305843009213693952u, 1056340403235818873u, 16396894922196123648u, 13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u, 14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u, 0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u, 4611686018427387904u, 18041275058083100u }; SHA3_round(T, A, 0x0000000000008082); assert(T[0] == 16394434931424703552u); assert(T[1] == 10202638136074191489u); assert(T[2] == 6432602484395933614u); assert(T[3] == 10616058301262943899u); assert(T[4] == 14391824303596635982u); assert(T[5] == 5673590995284149638u); assert(T[6] == 15681872423764765508u); assert(T[7] == 11470206704342013341u); assert(T[8] == 8508807405493883168u); assert(T[9] == 9461805213344568570u); assert(T[10] == 8792313850970105187u); assert(T[11] == 13508586629627657374u); assert(T[12] == 5157283382205130943u); assert(T[13] == 375019647457809685u); assert(T[14] == 9294608398083155963u); assert(T[15] == 16923121173371064314u); assert(T[16] == 4737739424553008030u); assert(T[17] == 5823987023293412593u); assert(T[18] == 13908063749137376267u); assert(T[19] == 13781177305593198238u); assert(T[20] == 9673833001659673401u); assert(T[21] == 17282395057630454440u); assert(T[22] == 12906624984756985556u); assert(T[23] == 3081478361927354234u); assert(T[24] == 93297594635310132u); return 0; } ================ Your help debugging and fixing this problem is appreciated! -- Regards, Uri Blumenthal Voice: (781) 981-1638 Secure Resilient Systems and Technologies Cell: (339) 223-5363 MIT Lincoln Laboratory 244 Wood Street, Lexington, MA 02420-9108 Web: https://www.ll.mit.edu/biographies/uri-blumenthal Root CA: https://www.ll.mit.edu/llrca2.pem There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. The other is to make it so complex there are no obvious deficiencies. - C. A. R. Hoare _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Jakub Kuderski _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- ~Craig _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/a46f5fa2/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5249 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/a46f5fa2/attachment-0001.bin>
Dimitry Andric via llvm-dev
2021-Oct-25 21:43 UTC
[llvm-dev] Problem with clang optimizer?
Hi Uri, Unfortunately the fix for this didn't make into 13.0.0, and will hopefully be part of 13.0.1 (when that comes out I can't say though). -Dimitry> On 25 Oct 2021, at 23:35, Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I just tried Clang-13 (with LLVM-13), and the problem is still there. Vectorizer still broken wrt. SSE-4.1 instruction extensions: > > $ echo $CXXFLAGS > -std=gnu++17 -O3 -march=native -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk > $ clang++-mp-13 $CXXFLAGS -o t sha3-reproducer.cxx > $ ./t > Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. > Abort trap: 6 > $ clang++-mp-13 $CXXFLAGS -mno-sse4.1 -o t sha3-reproducer.cxx > $ ./t > $ > > > -- > Regards, > Uri > > There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. > The other is to make it so complex there are no obvious deficiencies. > - C. A. R. Hoare > > > From: Jameson Nash <vtjnash at gmail.com> > Date: Wednesday, September 29, 2021 at 19:41 > To: Craig Topper <craig.topper at gmail.com> > Cc: Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] Problem with clang optimizer? > > This may be fixed now (https://reviews.llvm.org/D106613 <https://reviews.llvm.org/D106613>), but it remains to be confirmed for https://bugs.llvm.org/show_bug.cgi?id=51957 <https://bugs.llvm.org/show_bug.cgi?id=51957> > > On Sun, Sep 26, 2021 at 1:12 AM Craig Topper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> Looking at the IR here https://godbolt.org/z/zaMW1renW <https://godbolt.org/z/zaMW1renW> I believe the issue is on this instruction on line 361 >> >> %30 = extractelement <2 x <2 x i64>*> %bc438, i32 0 >> >> It should be extracting from index 1 instead of index 0. >> >> ~Craig >> >> >> On Sat, Sep 25, 2021 at 5:48 PM Blumenthal, Uri - 0553 - MITLL <uri at ll.mit.edu <mailto:uri at ll.mit.edu>> wrote: >>> I found that >>> · The problem disappears with -mno-sse4.1 >>> · The problem manifests with both Apple Clang from Xcode-13, and LLVM Clang-12 (and not with Xcode-12 or LLVM Clang-11) >>> · I could experiment only on Apple platform, as that’s the only one I have that runs LLVM Clang-12. >>> >>> -- >>> Regards, >>> Uri >>> >>> There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. >>> The other is to make it so complex there are no obvious deficiencies. >>> - C. A. R. Hoare >>> >>> >>> From: Craig Topper <craig.topper at gmail.com <mailto:craig.topper at gmail.com>> >>> Date: Saturday, September 25, 2021 at 12:07 >>> To: Dimitry Andric <dimitry at andric.com <mailto:dimitry at andric.com>> >>> Cc: Uri Blumenthal <uri at ll.mit.edu <mailto:uri at ll.mit.edu>>, LLVM-DEV LIST <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> >>> Subject: Re: [llvm-dev] Problem with clang optimizer? >>> >>> It reproduced for me with -march=nehalem which does not have AVX. >>> >>> On Sat, Sep 25, 2021 at 2:51 AM Dimitry Andric via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> It is only occurring (as far as I can see now) on x86_64, with -mavx enabled. Or with a target CPU that supports AVX. And it is not Apple clang specific. >>>> >>>> -Dimitry >>>> >>>> >>>>> On 24 Sep 2021, at 15:30, Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> I tried to reproduce it on goldbolt with clang 12.0.0 and 12.0.1 but things seem fine when I run it there: https://godbolt.org/z/vrq8j6Kj7 <https://godbolt.org/z/vrq8j6Kj7>. >>>>> Can you share your exact clang invocation? Does it only reproduce in some specific environment? >>>>> >>>>> Save the source I posted before into “sha3-reproducer.cxx” file. Let me know if you want it re-posted here. >>>>> >>>>> $ clang++-mp-12 -v >>>>> clang version 12.0.1 >>>>> Target: x86_64-apple-darwin20.6.0 >>>>> Thread model: posix >>>>> InstalledDir: /opt/local/libexec/llvm-12/bin >>>>> $ clang++-mp-12 -o s -O3 sha3-reproducer.cxx >>>>> $ ./s >>>>> Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. >>>>> Abort trap: 6 >>>>> $ clang++-mp-12 -o s -O2 sha3-reproducer.cxx >>>>> $ ./s >>>>> Assertion failed: (T[0] == 16394434931424703552u), function main, file sha3-reproducer.cxx, line 103. >>>>> Abort trap: 6 >>>>> $ clang++-mp-12 -o s -O1 sha3-reproducer.cxx >>>>> $ ./s >>>>> $ >>>>> >>>>> Clang-12 is installed via Macports, which is why we invoke the executable as clang++-mp-12. >>>>> >>>>> The same problem manifests in exactly the same way in the Xcode-13 version of Clang (presumably based on LLVM Clang-12). >>>>> >>>>> I’ll be happy to provide more of specific details, if you let me know what you need. >>>>> >>>>> >>>>> Also, it generally helps to reduce code bug reports as much as possible; creduce can help with that: https://embed.cs.utah.edu/creduce/using/ <https://embed.cs.utah.edu/creduce/using/>. >>>>> >>>>> Understood. Unfortunately, the above reproducer is the best we could come up with. An alternative is trying to build the Botan package itself https://github.com/randombit/botan.git <https://github.com/randombit/botan.git>. >>>>> >>>>> >>>>> >>>>> On Thu, Sep 23, 2021 at 10:14 PM Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>>> I’m not sure if this is the correct list, so please direct me to the right one if this bug report shouldn’t go here. >>>>>> >>>>>> The problem is: invoking clang (v12) with -O2 or better optimization flags generates wrong object code for the following C++. Compiling it with -O1 generates working binary. >>>>>> >>>>>> >>>>>> ================>>>>>> >>>>>> #include <cstdint> >>>>>> #include <cassert> >>>>>> >>>>>> template<size_t ROT, typename T> >>>>>> inline constexpr T rotl(T input) >>>>>> { >>>>>> static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation constant"); >>>>>> return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT))); >>>>>> } >>>>>> >>>>>> inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC) >>>>>> { >>>>>> const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20]; >>>>>> const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21]; >>>>>> >>>>>> // the calculation of C2 fails for -O3 or -O2 with clang 12 >>>>>> // FWIW: it would produce a value that doesn't fit into a _signed_ 64-bit int >>>>>> const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22]; >>>>>> >>>>>> const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23]; >>>>>> const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24]; >>>>>> >>>>>> const uint64_t D0 = rotl<1>(C0) ^ C3; >>>>>> const uint64_t D1 = rotl<1>(C1) ^ C4; >>>>>> const uint64_t D2 = rotl<1>(C2) ^ C0; >>>>>> const uint64_t D3 = rotl<1>(C3) ^ C1; >>>>>> const uint64_t D4 = rotl<1>(C4) ^ C2; >>>>>> >>>>>> const uint64_t B00 = A[ 0] ^ D1; >>>>>> const uint64_t B01 = rotl<44>(A[ 6] ^ D2); >>>>>> const uint64_t B02 = rotl<43>(A[12] ^ D3); >>>>>> const uint64_t B03 = rotl<21>(A[18] ^ D4); >>>>>> const uint64_t B04 = rotl<14>(A[24] ^ D0); >>>>>> T[ 0] = B00 ^ (~B01 & B02) ^ RC; >>>>>> T[ 1] = B01 ^ (~B02 & B03); >>>>>> T[ 2] = B02 ^ (~B03 & B04); >>>>>> T[ 3] = B03 ^ (~B04 & B00); >>>>>> T[ 4] = B04 ^ (~B00 & B01); >>>>>> >>>>>> const uint64_t B05 = rotl<28>(A[ 3] ^ D4); >>>>>> const uint64_t B06 = rotl<20>(A[ 9] ^ D0); >>>>>> const uint64_t B07 = rotl< 3>(A[10] ^ D1); >>>>>> const uint64_t B08 = rotl<45>(A[16] ^ D2); >>>>>> const uint64_t B09 = rotl<61>(A[22] ^ D3); >>>>>> T[ 5] = B05 ^ (~B06 & B07); >>>>>> T[ 6] = B06 ^ (~B07 & B08); >>>>>> T[ 7] = B07 ^ (~B08 & B09); >>>>>> T[ 8] = B08 ^ (~B09 & B05); >>>>>> T[ 9] = B09 ^ (~B05 & B06); >>>>>> >>>>>> // --- instructions starting from here can be removed >>>>>> // and the -O3 dicrepancy is still triggered >>>>>> >>>>>> const uint64_t B10 = rotl< 1>(A[ 1] ^ D2); >>>>>> const uint64_t B11 = rotl< 6>(A[ 7] ^ D3); >>>>>> const uint64_t B12 = rotl<25>(A[13] ^ D4); >>>>>> const uint64_t B13 = rotl< 8>(A[19] ^ D0); >>>>>> const uint64_t B14 = rotl<18>(A[20] ^ D1); >>>>>> T[10] = B10 ^ (~B11 & B12); >>>>>> T[11] = B11 ^ (~B12 & B13); >>>>>> T[12] = B12 ^ (~B13 & B14); >>>>>> T[13] = B13 ^ (~B14 & B10); >>>>>> T[14] = B14 ^ (~B10 & B11); >>>>>> >>>>>> const uint64_t B15 = rotl<27>(A[ 4] ^ D0); >>>>>> const uint64_t B16 = rotl<36>(A[ 5] ^ D1); >>>>>> const uint64_t B17 = rotl<10>(A[11] ^ D2); >>>>>> const uint64_t B18 = rotl<15>(A[17] ^ D3); >>>>>> const uint64_t B19 = rotl<56>(A[23] ^ D4); >>>>>> T[15] = B15 ^ (~B16 & B17); >>>>>> T[16] = B16 ^ (~B17 & B18); >>>>>> T[17] = B17 ^ (~B18 & B19); >>>>>> T[18] = B18 ^ (~B19 & B15); >>>>>> T[19] = B19 ^ (~B15 & B16); >>>>>> >>>>>> const uint64_t B20 = rotl<62>(A[ 2] ^ D3); >>>>>> const uint64_t B21 = rotl<55>(A[ 8] ^ D4); >>>>>> const uint64_t B22 = rotl<39>(A[14] ^ D0); >>>>>> const uint64_t B23 = rotl<41>(A[15] ^ D1); >>>>>> const uint64_t B24 = rotl< 2>(A[21] ^ D2); >>>>>> T[20] = B20 ^ (~B21 & B22); >>>>>> T[21] = B21 ^ (~B22 & B23); >>>>>> T[22] = B22 ^ (~B23 & B24); >>>>>> T[23] = B23 ^ (~B24 & B20); >>>>>> T[24] = B24 ^ (~B20 & B21); >>>>>> } >>>>>> >>>>>> int main() >>>>>> { >>>>>> uint64_t T[25]; >>>>>> >>>>>> uint64_t A[25] = { >>>>>> 15515230172486u, 9751542238472685244u, 220181482233372672u, >>>>>> 2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u, >>>>>> 2305843009213693952u, 1056340403235818873u, 16396894922196123648u, >>>>>> 13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u, >>>>>> 14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u, >>>>>> 0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u, >>>>>> 4611686018427387904u, 18041275058083100u }; >>>>>> >>>>>> SHA3_round(T, A, 0x0000000000008082); >>>>>> >>>>>> assert(T[0] == 16394434931424703552u); >>>>>> assert(T[1] == 10202638136074191489u); >>>>>> assert(T[2] == 6432602484395933614u); >>>>>> assert(T[3] == 10616058301262943899u); >>>>>> assert(T[4] == 14391824303596635982u); >>>>>> assert(T[5] == 5673590995284149638u); >>>>>> assert(T[6] == 15681872423764765508u); >>>>>> assert(T[7] == 11470206704342013341u); >>>>>> assert(T[8] == 8508807405493883168u); >>>>>> assert(T[9] == 9461805213344568570u); >>>>>> assert(T[10] == 8792313850970105187u); >>>>>> assert(T[11] == 13508586629627657374u); >>>>>> assert(T[12] == 5157283382205130943u); >>>>>> assert(T[13] == 375019647457809685u); >>>>>> assert(T[14] == 9294608398083155963u); >>>>>> assert(T[15] == 16923121173371064314u); >>>>>> assert(T[16] == 4737739424553008030u); >>>>>> assert(T[17] == 5823987023293412593u); >>>>>> assert(T[18] == 13908063749137376267u); >>>>>> assert(T[19] == 13781177305593198238u); >>>>>> assert(T[20] == 9673833001659673401u); >>>>>> assert(T[21] == 17282395057630454440u); >>>>>> assert(T[22] == 12906624984756985556u); >>>>>> assert(T[23] == 3081478361927354234u); >>>>>> assert(T[24] == 93297594635310132u); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> ================>>>>>> >>>>>> Your help debugging and fixing this problem is appreciated! >>>>>> -- >>>>>> Regards, >>>>>> Uri Blumenthal Voice: (781) 981-1638 >>>>>> Secure Resilient Systems and Technologies Cell: (339) 223-5363 >>>>>> MIT Lincoln Laboratory >>>>>> 244 Wood Street, Lexington, MA <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> 02420-9108 <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> >>>>>> >>>>>> Web: https://www.ll.mit.edu/biographies/uri-blumenthal <https://www.ll.mit.edu/biographies/uri-blumenthal> >>>>>> Root CA: https://www.ll.mit.edu/llrca2.pem <https://www.ll.mit.edu/llrca2.pem> >>>>>> >>>>>> There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. >>>>>> The other is to make it so complex there are no obvious deficiencies. >>>>>> - C. A. R. Hoare >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>>>> >>>>> >>>>> -- >>>>> Jakub Kuderski >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> -- >>> ~Craig >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>_______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/43258e2e/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 223 bytes Desc: Message signed with OpenPGP URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211025/43258e2e/attachment-0001.sig>
Anton Afanasyev via llvm-dev
2021-Oct-26 08:14 UTC
[llvm-dev] Problem with clang optimizer?
Hi Uri, could you please reproduce this at godbolt.org? AFAIK, this issue is veiled at clang-13, though real fix isn't backported (see Dimitry's comment: https://bugs.llvm.org/show_bug.cgi?id=51957#c7). I can't reproduce it on clang-13: https://godbolt.org/z/4Mdrd5388 Thanks, Anton вт, 26 окт. 2021 г. в 00:35, Blumenthal, Uri - 0553 - MITLL via llvm-dev < llvm-dev at lists.llvm.org>:> I just tried Clang-13 (with LLVM-13), and the problem is still there. > Vectorizer still broken wrt. SSE-4.1 instruction extensions: > > > > $ echo $CXXFLAGS > > -std=gnu++17 -O3 -march=native -isysroot > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk > > $ clang++-mp-13 $CXXFLAGS -o t sha3-reproducer.cxx > > $ ./t > > Assertion failed: (T[0] == 16394434931424703552u), function main, file > sha3-reproducer.cxx, line 103. > > Abort trap: 6 > > $ clang++-mp-13 $CXXFLAGS -mno-sse4.1 -o t sha3-reproducer.cxx > > $ ./t > > $ > > > > > > -- > > Regards, > > Uri > > > > *There are two ways to design a system. One is to make is so simple there > are obviously no deficiencies.* > > *The other is to make it so complex there are no obvious deficiencies.* > > * > - > C. A. R. Hoare* > > > > > > *From: *Jameson Nash <vtjnash at gmail.com> > *Date: *Wednesday, September 29, 2021 at 19:41 > *To: *Craig Topper <craig.topper at gmail.com> > *Cc: *Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST < > llvm-dev at lists.llvm.org> > *Subject: *Re: [llvm-dev] Problem with clang optimizer? > > > > This may be fixed now (https://reviews.llvm.org/D106613), but it remains > to be confirmed for https://bugs.llvm.org/show_bug.cgi?id=51957 > > > > On Sun, Sep 26, 2021 at 1:12 AM Craig Topper via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Looking at the IR here https://godbolt.org/z/zaMW1renW I believe the > issue is on this instruction on line 361 > > > > %30 = extractelement <2 x <2 x i64>*> %bc438, i32 0 > > > > It should be extracting from index 1 instead of index 0. > > > ~Craig > > > > > > On Sat, Sep 25, 2021 at 5:48 PM Blumenthal, Uri - 0553 - MITLL < > uri at ll.mit.edu> wrote: > > I found that > > · *The problem disappears with **-mno-sse4.1* > > · The problem manifests with both Apple Clang from Xcode-13, and > LLVM Clang-12 (and not with Xcode-12 or LLVM Clang-11) > > · I could experiment only on Apple platform, as that’s the only > one I have that runs LLVM Clang-12. > > > > -- > > Regards, > > Uri > > > > *There are two ways to design a system. One is to make is so simple there > are obviously no deficiencies.* > > *The other is to make it so complex there are no obvious deficiencies.* > > * > - > C. A. R. Hoare* > > > > > > *From: *Craig Topper <craig.topper at gmail.com> > *Date: *Saturday, September 25, 2021 at 12:07 > *To: *Dimitry Andric <dimitry at andric.com> > *Cc: *Uri Blumenthal <uri at ll.mit.edu>, LLVM-DEV LIST < > llvm-dev at lists.llvm.org> > *Subject: *Re: [llvm-dev] Problem with clang optimizer? > > > > It reproduced for me with -march=nehalem which does not have AVX. > > > > On Sat, Sep 25, 2021 at 2:51 AM Dimitry Andric via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > It is only occurring (as far as I can see now) on x86_64, with -mavx > enabled. Or with a target CPU that supports AVX. And it is not Apple clang > specific. > > > > -Dimitry > > > > On 24 Sep 2021, at 15:30, Blumenthal, Uri - 0553 - MITLL via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > I tried to reproduce it on goldbolt with clang 12.0.0 and 12.0.1 but > things seem fine when I run it there: https://godbolt.org/z/vrq8j6Kj7. > Can you share your exact clang invocation? Does it only reproduce in some > specific environment? > > > > Save the source I posted before into “sha3-reproducer.cxx” file. Let me > know if you want it re-posted here. > > > > $ clang++-mp-12 -v > > clang version 12.0.1 > > Target: x86_64-apple-darwin20.6.0 > > Thread model: posix > > InstalledDir: /opt/local/libexec/llvm-12/bin > > $ clang++-mp-12 -o s -O3 sha3-reproducer.cxx > > $ ./s > > Assertion failed: (T[0] == 16394434931424703552u), function main, file > sha3-reproducer.cxx, line 103. > > Abort trap: 6 > > $ clang++-mp-12 -o s -O2 sha3-reproducer.cxx > > $ ./s > > Assertion failed: (T[0] == 16394434931424703552u), function main, file > sha3-reproducer.cxx, line 103. > > Abort trap: 6 > > $ clang++-mp-12 -o s -O1 sha3-reproducer.cxx > > $ ./s > > $ > > > > Clang-12 is installed via Macports, which is why we invoke the executable > as clang++-mp-12. > > > > The same problem manifests in exactly the same way in the Xcode-13 version > of Clang (presumably based on LLVM Clang-12). > > > > I’ll be happy to provide more of specific details, if you let me know what > you need. > > > > Also, it generally helps to reduce code bug reports as much as possible; > creduce can help with that: https://embed.cs.utah.edu/creduce/using/. > > > > Understood. Unfortunately, the above reproducer is the best we could come > up with. An alternative is trying to build the Botan package itself > https://github.com/randombit/botan.git. > > > > > > On Thu, Sep 23, 2021 at 10:14 PM Blumenthal, Uri - 0553 - MITLL via > llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I’m not sure if this is the correct list, so please direct me to the right > one if this bug report shouldn’t go here. > > > > The problem is: invoking clang (v12) with -O2 or better optimization flags > generates wrong object code for the following C++. Compiling it with -O1 > generates working binary. > > > > > > ================> > > > #include <cstdint> > > #include <cassert> > > > > template<size_t ROT, typename T> > > inline constexpr T rotl(T input) > > { > > static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation > constant"); > > return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT))); > > } > > > > inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC) > > { > > const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20]; > > const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21]; > > > > // the calculation of C2 fails for -O3 or -O2 with clang 12 > > // FWIW: it would produce a value that doesn't fit into a _signed_ > 64-bit int > > const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22]; > > > > const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23]; > > const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24]; > > > > const uint64_t D0 = rotl<1>(C0) ^ C3; > > const uint64_t D1 = rotl<1>(C1) ^ C4; > > const uint64_t D2 = rotl<1>(C2) ^ C0; > > const uint64_t D3 = rotl<1>(C3) ^ C1; > > const uint64_t D4 = rotl<1>(C4) ^ C2; > > > > const uint64_t B00 = A[ 0] ^ D1; > > const uint64_t B01 = rotl<44>(A[ 6] ^ D2); > > const uint64_t B02 = rotl<43>(A[12] ^ D3); > > const uint64_t B03 = rotl<21>(A[18] ^ D4); > > const uint64_t B04 = rotl<14>(A[24] ^ D0); > > T[ 0] = B00 ^ (~B01 & B02) ^ RC; > > T[ 1] = B01 ^ (~B02 & B03); > > T[ 2] = B02 ^ (~B03 & B04); > > T[ 3] = B03 ^ (~B04 & B00); > > T[ 4] = B04 ^ (~B00 & B01); > > > > const uint64_t B05 = rotl<28>(A[ 3] ^ D4); > > const uint64_t B06 = rotl<20>(A[ 9] ^ D0); > > const uint64_t B07 = rotl< 3>(A[10] ^ D1); > > const uint64_t B08 = rotl<45>(A[16] ^ D2); > > const uint64_t B09 = rotl<61>(A[22] ^ D3); > > T[ 5] = B05 ^ (~B06 & B07); > > T[ 6] = B06 ^ (~B07 & B08); > > T[ 7] = B07 ^ (~B08 & B09); > > T[ 8] = B08 ^ (~B09 & B05); > > T[ 9] = B09 ^ (~B05 & B06); > > > > // --- instructions starting from here can be removed > > // and the -O3 dicrepancy is still triggered > > > > const uint64_t B10 = rotl< 1>(A[ 1] ^ D2); > > const uint64_t B11 = rotl< 6>(A[ 7] ^ D3); > > const uint64_t B12 = rotl<25>(A[13] ^ D4); > > const uint64_t B13 = rotl< 8>(A[19] ^ D0); > > const uint64_t B14 = rotl<18>(A[20] ^ D1); > > T[10] = B10 ^ (~B11 & B12); > > T[11] = B11 ^ (~B12 & B13); > > T[12] = B12 ^ (~B13 & B14); > > T[13] = B13 ^ (~B14 & B10); > > T[14] = B14 ^ (~B10 & B11); > > > > const uint64_t B15 = rotl<27>(A[ 4] ^ D0); > > const uint64_t B16 = rotl<36>(A[ 5] ^ D1); > > const uint64_t B17 = rotl<10>(A[11] ^ D2); > > const uint64_t B18 = rotl<15>(A[17] ^ D3); > > const uint64_t B19 = rotl<56>(A[23] ^ D4); > > T[15] = B15 ^ (~B16 & B17); > > T[16] = B16 ^ (~B17 & B18); > > T[17] = B17 ^ (~B18 & B19); > > T[18] = B18 ^ (~B19 & B15); > > T[19] = B19 ^ (~B15 & B16); > > > > const uint64_t B20 = rotl<62>(A[ 2] ^ D3); > > const uint64_t B21 = rotl<55>(A[ 8] ^ D4); > > const uint64_t B22 = rotl<39>(A[14] ^ D0); > > const uint64_t B23 = rotl<41>(A[15] ^ D1); > > const uint64_t B24 = rotl< 2>(A[21] ^ D2); > > T[20] = B20 ^ (~B21 & B22); > > T[21] = B21 ^ (~B22 & B23); > > T[22] = B22 ^ (~B23 & B24); > > T[23] = B23 ^ (~B24 & B20); > > T[24] = B24 ^ (~B20 & B21); > > } > > > > int main() > > { > > uint64_t T[25]; > > > > uint64_t A[25] = { > > 15515230172486u, 9751542238472685244u, 220181482233372672u, > > 2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u, > > 2305843009213693952u, 1056340403235818873u, 16396894922196123648u, > > 13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u, > > 14313837075027532897u, 32768u, 6880396441885696u, > 14320469711924527201u, > > 0u, 9814829303127743595u, 18014398509481984u, > 14444556046857390455u, > > 4611686018427387904u, 18041275058083100u }; > > > > SHA3_round(T, A, 0x0000000000008082); > > > > assert(T[0] == 16394434931424703552u); > > assert(T[1] == 10202638136074191489u); > > assert(T[2] == 6432602484395933614u); > > assert(T[3] == 10616058301262943899u); > > assert(T[4] == 14391824303596635982u); > > assert(T[5] == 5673590995284149638u); > > assert(T[6] == 15681872423764765508u); > > assert(T[7] == 11470206704342013341u); > > assert(T[8] == 8508807405493883168u); > > assert(T[9] == 9461805213344568570u); > > assert(T[10] == 8792313850970105187u); > > assert(T[11] == 13508586629627657374u); > > assert(T[12] == 5157283382205130943u); > > assert(T[13] == 375019647457809685u); > > assert(T[14] == 9294608398083155963u); > > assert(T[15] == 16923121173371064314u); > > assert(T[16] == 4737739424553008030u); > > assert(T[17] == 5823987023293412593u); > > assert(T[18] == 13908063749137376267u); > > assert(T[19] == 13781177305593198238u); > > assert(T[20] == 9673833001659673401u); > > assert(T[21] == 17282395057630454440u); > > assert(T[22] == 12906624984756985556u); > > assert(T[23] == 3081478361927354234u); > > assert(T[24] == 93297594635310132u); > > > > return 0; > > } > > ================> > > > Your help debugging and fixing this problem is appreciated! > > -- > > Regards, > > Uri Blumenthal Voice: (781) 981-1638 > > Secure Resilient Systems and Technologies Cell: (339) 223-5363 > > MIT Lincoln Laboratory > > 244 Wood Street, Lexington, MA > <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> > 02420-9108 > <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> > > > > > Web: https://www.ll.mit.edu/biographies/uri-blumenthal > > Root CA: https://www.ll.mit.edu/llrca2.pem > > > > *There are two ways to design a system. One is to make is so simple there > are obviously no deficiencies.* > > *The other is to make it so complex there are no obvious deficiencies.* > > * > - > C. A. R. Hoare* > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > > Jakub Kuderski > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > -- > > ~Craig > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211026/fcd4f0e5/attachment.html>