Blumenthal, Uri - 0553 - MITLL via llvm-dev
2021-Sep-24 02:14 UTC
[llvm-dev] Problem with clang optimizer?
I’m not sure if this is the correct list, so please direct me to the right one if this bug report shouldn’t go here. The problem is: invoking clang (v12) with -O2 or better optimization flags generates wrong object code for the following C++. Compiling it with -O1 generates working binary. ================ #include <cstdint> #include <cassert> template<size_t ROT, typename T> inline constexpr T rotl(T input) { static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation constant"); return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT))); } inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC) { const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20]; const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21]; // the calculation of C2 fails for -O3 or -O2 with clang 12 // FWIW: it would produce a value that doesn't fit into a _signed_ 64-bit int const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22]; const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23]; const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24]; const uint64_t D0 = rotl<1>(C0) ^ C3; const uint64_t D1 = rotl<1>(C1) ^ C4; const uint64_t D2 = rotl<1>(C2) ^ C0; const uint64_t D3 = rotl<1>(C3) ^ C1; const uint64_t D4 = rotl<1>(C4) ^ C2; const uint64_t B00 = A[ 0] ^ D1; const uint64_t B01 = rotl<44>(A[ 6] ^ D2); const uint64_t B02 = rotl<43>(A[12] ^ D3); const uint64_t B03 = rotl<21>(A[18] ^ D4); const uint64_t B04 = rotl<14>(A[24] ^ D0); T[ 0] = B00 ^ (~B01 & B02) ^ RC; T[ 1] = B01 ^ (~B02 & B03); T[ 2] = B02 ^ (~B03 & B04); T[ 3] = B03 ^ (~B04 & B00); T[ 4] = B04 ^ (~B00 & B01); const uint64_t B05 = rotl<28>(A[ 3] ^ D4); const uint64_t B06 = rotl<20>(A[ 9] ^ D0); const uint64_t B07 = rotl< 3>(A[10] ^ D1); const uint64_t B08 = rotl<45>(A[16] ^ D2); const uint64_t B09 = rotl<61>(A[22] ^ D3); T[ 5] = B05 ^ (~B06 & B07); T[ 6] = B06 ^ (~B07 & B08); T[ 7] = B07 ^ (~B08 & B09); T[ 8] = B08 ^ (~B09 & B05); T[ 9] = B09 ^ (~B05 & B06); // --- instructions starting from here can be removed // and the -O3 dicrepancy is still triggered const uint64_t B10 = rotl< 1>(A[ 1] ^ D2); const uint64_t B11 = rotl< 6>(A[ 7] ^ D3); const uint64_t B12 = rotl<25>(A[13] ^ D4); const uint64_t B13 = rotl< 8>(A[19] ^ D0); const uint64_t B14 = rotl<18>(A[20] ^ D1); T[10] = B10 ^ (~B11 & B12); T[11] = B11 ^ (~B12 & B13); T[12] = B12 ^ (~B13 & B14); T[13] = B13 ^ (~B14 & B10); T[14] = B14 ^ (~B10 & B11); const uint64_t B15 = rotl<27>(A[ 4] ^ D0); const uint64_t B16 = rotl<36>(A[ 5] ^ D1); const uint64_t B17 = rotl<10>(A[11] ^ D2); const uint64_t B18 = rotl<15>(A[17] ^ D3); const uint64_t B19 = rotl<56>(A[23] ^ D4); T[15] = B15 ^ (~B16 & B17); T[16] = B16 ^ (~B17 & B18); T[17] = B17 ^ (~B18 & B19); T[18] = B18 ^ (~B19 & B15); T[19] = B19 ^ (~B15 & B16); const uint64_t B20 = rotl<62>(A[ 2] ^ D3); const uint64_t B21 = rotl<55>(A[ 8] ^ D4); const uint64_t B22 = rotl<39>(A[14] ^ D0); const uint64_t B23 = rotl<41>(A[15] ^ D1); const uint64_t B24 = rotl< 2>(A[21] ^ D2); T[20] = B20 ^ (~B21 & B22); T[21] = B21 ^ (~B22 & B23); T[22] = B22 ^ (~B23 & B24); T[23] = B23 ^ (~B24 & B20); T[24] = B24 ^ (~B20 & B21); } int main() { uint64_t T[25]; uint64_t A[25] = { 15515230172486u, 9751542238472685244u, 220181482233372672u, 2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u, 2305843009213693952u, 1056340403235818873u, 16396894922196123648u, 13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u, 14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u, 0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u, 4611686018427387904u, 18041275058083100u }; SHA3_round(T, A, 0x0000000000008082); assert(T[0] == 16394434931424703552u); assert(T[1] == 10202638136074191489u); assert(T[2] == 6432602484395933614u); assert(T[3] == 10616058301262943899u); assert(T[4] == 14391824303596635982u); assert(T[5] == 5673590995284149638u); assert(T[6] == 15681872423764765508u); assert(T[7] == 11470206704342013341u); assert(T[8] == 8508807405493883168u); assert(T[9] == 9461805213344568570u); assert(T[10] == 8792313850970105187u); assert(T[11] == 13508586629627657374u); assert(T[12] == 5157283382205130943u); assert(T[13] == 375019647457809685u); assert(T[14] == 9294608398083155963u); assert(T[15] == 16923121173371064314u); assert(T[16] == 4737739424553008030u); assert(T[17] == 5823987023293412593u); assert(T[18] == 13908063749137376267u); assert(T[19] == 13781177305593198238u); assert(T[20] == 9673833001659673401u); assert(T[21] == 17282395057630454440u); assert(T[22] == 12906624984756985556u); assert(T[23] == 3081478361927354234u); assert(T[24] == 93297594635310132u); return 0; } ================ Your help debugging and fixing this problem is appreciated! -- Regards, Uri Blumenthal Voice: (781) 981-1638 Secure Resilient Systems and Technologies Cell: (339) 223-5363 MIT Lincoln Laboratory 244 Wood Street, Lexington, MA 02420-9108 Web: https://www.ll.mit.edu/biographies/uri-blumenthal Root CA: https://www.ll.mit.edu/llrca2.pem There are two ways to design a system. One is to make is so simple there are obviously no deficiencies. The other is to make it so complex there are no obvious deficiencies. - C. A. R. Hoare -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210924/8f2e8b9c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5249 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210924/8f2e8b9c/attachment.bin>
Jakub (Kuba) Kuderski via llvm-dev
2021-Sep-24 02:37 UTC
[llvm-dev] Problem with clang optimizer?
Hi Uri, I tried to reproduce it on goldbolt with clang 12.0.0 and 12.0.1 but things seem fine when I run it there: https://godbolt.org/z/vrq8j6Kj7. Can you share your exact clang invocation? Does it only reproduce in some specific environment? Also, it generally helps to reduce code bug reports as much as possible; creduce can help with that: https://embed.cs.utah.edu/creduce/using/. -Jakub On Thu, Sep 23, 2021 at 10:14 PM Blumenthal, Uri - 0553 - MITLL via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I’m not sure if this is the correct list, so please direct me to the right > one if this bug report shouldn’t go here. > > > > The problem is: invoking clang (v12) with -O2 or better optimization flags > generates wrong object code for the following C++. Compiling it with -O1 > generates working binary. > > > > > > ================> > > > #include <cstdint> > > #include <cassert> > > > > template<size_t ROT, typename T> > > inline constexpr T rotl(T input) > > { > > static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation > constant"); > > return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT))); > > } > > > > inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC) > > { > > const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20]; > > const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21]; > > > > // the calculation of C2 fails for -O3 or -O2 with clang 12 > > // FWIW: it would produce a value that doesn't fit into a _signed_ > 64-bit int > > const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22]; > > > > const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23]; > > const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24]; > > > > const uint64_t D0 = rotl<1>(C0) ^ C3; > > const uint64_t D1 = rotl<1>(C1) ^ C4; > > const uint64_t D2 = rotl<1>(C2) ^ C0; > > const uint64_t D3 = rotl<1>(C3) ^ C1; > > const uint64_t D4 = rotl<1>(C4) ^ C2; > > > > const uint64_t B00 = A[ 0] ^ D1; > > const uint64_t B01 = rotl<44>(A[ 6] ^ D2); > > const uint64_t B02 = rotl<43>(A[12] ^ D3); > > const uint64_t B03 = rotl<21>(A[18] ^ D4); > > const uint64_t B04 = rotl<14>(A[24] ^ D0); > > T[ 0] = B00 ^ (~B01 & B02) ^ RC; > > T[ 1] = B01 ^ (~B02 & B03); > > T[ 2] = B02 ^ (~B03 & B04); > > T[ 3] = B03 ^ (~B04 & B00); > > T[ 4] = B04 ^ (~B00 & B01); > > > > const uint64_t B05 = rotl<28>(A[ 3] ^ D4); > > const uint64_t B06 = rotl<20>(A[ 9] ^ D0); > > const uint64_t B07 = rotl< 3>(A[10] ^ D1); > > const uint64_t B08 = rotl<45>(A[16] ^ D2); > > const uint64_t B09 = rotl<61>(A[22] ^ D3); > > T[ 5] = B05 ^ (~B06 & B07); > > T[ 6] = B06 ^ (~B07 & B08); > > T[ 7] = B07 ^ (~B08 & B09); > > T[ 8] = B08 ^ (~B09 & B05); > > T[ 9] = B09 ^ (~B05 & B06); > > > > // --- instructions starting from here can be removed > > // and the -O3 dicrepancy is still triggered > > > > const uint64_t B10 = rotl< 1>(A[ 1] ^ D2); > > const uint64_t B11 = rotl< 6>(A[ 7] ^ D3); > > const uint64_t B12 = rotl<25>(A[13] ^ D4); > > const uint64_t B13 = rotl< 8>(A[19] ^ D0); > > const uint64_t B14 = rotl<18>(A[20] ^ D1); > > T[10] = B10 ^ (~B11 & B12); > > T[11] = B11 ^ (~B12 & B13); > > T[12] = B12 ^ (~B13 & B14); > > T[13] = B13 ^ (~B14 & B10); > > T[14] = B14 ^ (~B10 & B11); > > > > const uint64_t B15 = rotl<27>(A[ 4] ^ D0); > > const uint64_t B16 = rotl<36>(A[ 5] ^ D1); > > const uint64_t B17 = rotl<10>(A[11] ^ D2); > > const uint64_t B18 = rotl<15>(A[17] ^ D3); > > const uint64_t B19 = rotl<56>(A[23] ^ D4); > > T[15] = B15 ^ (~B16 & B17); > > T[16] = B16 ^ (~B17 & B18); > > T[17] = B17 ^ (~B18 & B19); > > T[18] = B18 ^ (~B19 & B15); > > T[19] = B19 ^ (~B15 & B16); > > > > const uint64_t B20 = rotl<62>(A[ 2] ^ D3); > > const uint64_t B21 = rotl<55>(A[ 8] ^ D4); > > const uint64_t B22 = rotl<39>(A[14] ^ D0); > > const uint64_t B23 = rotl<41>(A[15] ^ D1); > > const uint64_t B24 = rotl< 2>(A[21] ^ D2); > > T[20] = B20 ^ (~B21 & B22); > > T[21] = B21 ^ (~B22 & B23); > > T[22] = B22 ^ (~B23 & B24); > > T[23] = B23 ^ (~B24 & B20); > > T[24] = B24 ^ (~B20 & B21); > > } > > > > int main() > > { > > uint64_t T[25]; > > > > uint64_t A[25] = { > > 15515230172486u, 9751542238472685244u, 220181482233372672u, > > 2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u, > > 2305843009213693952u, 1056340403235818873u, 16396894922196123648u, > > 13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u, > > 14313837075027532897u, 32768u, 6880396441885696u, > 14320469711924527201u, > > 0u, 9814829303127743595u, 18014398509481984u, > 14444556046857390455u, > > 4611686018427387904u, 18041275058083100u }; > > > > SHA3_round(T, A, 0x0000000000008082); > > > > assert(T[0] == 16394434931424703552u); > > assert(T[1] == 10202638136074191489u); > > assert(T[2] == 6432602484395933614u); > > assert(T[3] == 10616058301262943899u); > > assert(T[4] == 14391824303596635982u); > > assert(T[5] == 5673590995284149638u); > > assert(T[6] == 15681872423764765508u); > > assert(T[7] == 11470206704342013341u); > > assert(T[8] == 8508807405493883168u); > > assert(T[9] == 9461805213344568570u); > > assert(T[10] == 8792313850970105187u); > > assert(T[11] == 13508586629627657374u); > > assert(T[12] == 5157283382205130943u); > > assert(T[13] == 375019647457809685u); > > assert(T[14] == 9294608398083155963u); > > assert(T[15] == 16923121173371064314u); > > assert(T[16] == 4737739424553008030u); > > assert(T[17] == 5823987023293412593u); > > assert(T[18] == 13908063749137376267u); > > assert(T[19] == 13781177305593198238u); > > assert(T[20] == 9673833001659673401u); > > assert(T[21] == 17282395057630454440u); > > assert(T[22] == 12906624984756985556u); > > assert(T[23] == 3081478361927354234u); > > assert(T[24] == 93297594635310132u); > > > > return 0; > > } > > ================> > > > Your help debugging and fixing this problem is appreciated! > > -- > > Regards, > > Uri Blumenthal Voice: (781) 981-1638 > > Secure Resilient Systems and Technologies Cell: (339) 223-5363 > > MIT Lincoln Laboratory > > 244 Wood Street, Lexington, MA > <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> > 02420-9108 > <https://www.google.com/maps/search/Wood+Street,+Lexington,+MA+02420-9108?entry=gmail&source=g> > > > > > Web: https://www.ll.mit.edu/biographies/uri-blumenthal > > Root CA: https://www.ll.mit.edu/llrca2.pem > > > > *There are two ways to design a system. One is to make is so simple there > are obviously no deficiencies.* > > *The other is to make it so complex there are no obvious deficiencies.* > > * > - > C. A. R. Hoare* > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210923/76ebe2a2/attachment-0001.html>