Stefan Kanthak via llvm-dev
2018-Nov-26 12:27 UTC
[llvm-dev] BUGS in code generated for target i386-win32
Hi @ll, LLVM/clang generates wrong code for the following program (see <https://godbolt.org/z/UZrrkG>): --- sample.c --- unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial) { __asm { add ecx, ecx ; ecx = argument << 1 sbb eax, eax ; eax = CF ? -1 : 0 and eax, edx ; eax = CF ? polynomial : 0 xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0) } } int main() { unsigned lfsr = 123456789; unsigned period = 0; do { period++; #ifdef MITIGATE lfsr = lfsr & 0x80000000 ? 0x04C11DB7 ^ (lfsr << 1) : lfsr << 1; #else lfsr = lfsr32(lfsr, 0x04C11DB7); #endif } while (lfsr != 123456789); return period; } --- EOF --- Compiled with -O2 -target i386-win32 this yields the following code: _main: # @main xor edx, edx LBB1_1: # =>This Inner Loop Header: Depth=1 add ecx, ecx sbb eax, eax and eax, edx xor eax, ecx inc edx cmp eax, 123456789 jne LBB1_1 mov eax, edx ret BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"! ~~~~~~~ It fails to load the result of the inlined function call, held in EAX, back into ECX, which holds the first argument with the __fastcall calling convention, inside the loop starting with LBB1_1: FIX #1: emit a "mov ecx, eax" immediately after LBB1_1: ~~~~~~~ BUG #2: the variable "lfsr" is NOT initialized! ~~~~~~~ The constant 123456789 is NOT loaded into whatever register holds "lfsr". FIX #2: emit a "mov eax, 123456789" immediately before LBB1_1: ~~~~~~~ BUG #3: the compiler allocates EDX for the variable "period"! ~~~~~~~ EDX is a volatile register, it is not preserved through function calls; additionally it holds the second argument with the __fastcall calling convention, so the constant 0x04C11DB7 MUST be loaded into EDX before label LBB1_1: The compiler MUST use another register (of course except ECX, which holds the first argument with __fastcall, and except EAX, which holds the return value), for example EBX instead of EDX. Stefan Kanthak PS: for comparision with another (likewise buggy) compiler, take a look at <https://skanthak.homepage.t-online.de/msvc.html#example9>
Tim Northover via llvm-dev
2018-Nov-26 14:03 UTC
[llvm-dev] BUGS in code generated for target i386-win32
Hi Stefan, On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev <llvm-dev at lists.llvm.org> wrote:> LLVM/clang generates wrong code for the following program > (see <https://godbolt.org/z/UZrrkG>):It looks like all of these issues come down to mismatched expectations on what goes into and out of an __asm block. If you look at the MSVC documentation (which I think Clang is trying to be compatible with), it warns against assuming known values are in particular registers at any point, and even using __fastcall at all for a function with an __asm block for that reason: https://docs.microsoft.com/en-gb/cpp/assembler/inline/using-and-preserving-registers-in-inline-assembly?view=vs-2017 I'll try to explain a little below how that one mismatch causes the issues you're seeing.> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"! > BUG #2: the variable "lfsr" is NOT initialized!Since the __asm isn't linked (as far as Clang is concerned) to either input for lfsr32, they're both unused. The compiler has actually inlined that function as you said, then noticed this lack of use and decided to completely eliminate the lfsr variable. You can see this in LLVM IR if you add "-emit-llvm -g0" to the godbolt command-line. The loop *only* contains a counter and the lonely asm block> BUG #3: the compiler allocates EDX for the variable "period"! > ~~~~~~~ > EDX is a volatile register, it is not preserved through function calls;There is no function call after inlining, of course, so it doesn't need to be preserved in this case. It looks like there might be an issue with an __asm block that actually *makes* a call though: __asm { call Wibble } doesn't seem to mark registers used by the call as clobbered from my testing. It's slightly unclear what MSVC's behaviour is in this situation from the documentation, but some vague poking suggests it does know that a call instruction might clobber edx.> additionally it holds the second argument with the __fastcall calling > convention, so the constant 0x04C11DB7 MUST be loaded into EDX before > label LBB1_1:This is the same unused-as-far-as-the-compiler-knows thing as the first two. Cheers. Tim.
Stefan Kanthak via llvm-dev
2018-Nov-26 17:59 UTC
[llvm-dev] BUGS in code generated for target i386-win32
"Tim Northover" <t.p.northover at gmail.com> wrote:> Hi Stefan, > > On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> LLVM/clang generates wrong code for the following program >> (see <https://godbolt.org/z/UZrrkG>): > > It looks like all of these issues come down to mismatched expectations > on what goes into and out of an __asm block. If you look at the MSVC > documentation (which I think Clang is trying to be compatible with), > it warns against assuming known values are in particular registers at > any point, and even using __fastcall at all for a function with an > __asm block for that reason: > https://docs.microsoft.com/en-gb/cpp/assembler/inline/using-and-preserving-registers-in-inline-assembly?view=vs-2017Trust me: I KNOW THIS DOCUMENTATION!> I'll try to explain a little below how that one mismatch causes the > issues you're seeing. > >> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"! >> BUG #2: the variable "lfsr" is NOT initialized! > > Since the __asm isn't linked (as far as Clang is concerned) to > either input for lfsr32, they're both unused.REALLY? Or better: OUCH! Is Clang NOT aware of the __fastcall calling convention and its register usage? Clang does NOT create prolog/epilog here, so it CLEARLY knows that "argument" is held in ECX and "polynomial" in EDX ... as documented for __fastcall JFTR: Clang (like MSVC too) knows VERY well which registers are used (clobbered) in an __asm block. See <https://godbolt.org/z/blDIzK>, which proves your assumption WRONG!> The compiler has actually inlined that function as you said, then > noticed this lack of use and decided to completely eliminate the > lfsr variable.1. see above! 2. add "__declspec(noinline)" to the function definition ... and notice that the compiler STILL does NOT setup ANY of EAX, ECX and EDX before the function call! See <https://godbolt.org/z/X9R9w7>, which again proves your assumption WRONG.> You can see this in LLVM IR if you add "-emit-llvm -g0" to the godbolt > command-line. The loop *only* contains a counter and the lonely asm > block > >> BUG #3: the compiler allocates EDX for the variable "period"! >> ~~~~~~~ >> EDX is a volatile register, it is not preserved through function calls; > > There is no function call after inlining, of course, so it doesn't > need to be preserved in this case. It looks like there might be an > issue with an __asm block that actually *makes* a call though: > > __asm { > call Wibble > } > > doesn't seem to mark registers used by the call as clobbered from my > testing. It's slightly unclear what MSVC's behaviour is in this > situation from the documentation, but some vague poking suggests it > does know that a call instruction might clobber edx.The x86 calling conventions used on Windows (and some more OS) are VERY WELL KNOWN: EAX, ECX and EDX are NOT preserved through function calls. [EDX:]EAX is used for the [long] long return value, and ECX can be clobbered. Again the addition of "__declspec(noinline)" proves your assumptions wrong: the function call is NOT inlined any more, but its arguments are STILL not set up at all ... and the variable lfsr STILL not initialized. If the function is declared only, but not defined, then the compiler sets the arguments up before the call: see <https://godbolt.org/z/4pmOlC>>> additionally it holds the second argument with the __fastcall calling >> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before >> label LBB1_1: > > This is the same unused-as-far-as-the-compiler-knows thing as the first two.OUCH! <https://godbolt.org/z/X9R9w7> proves this WRONG. LLVM is apparently REALLY confused by __asm, at least when used inside a __fastcall function. My expectation is that AT LEAST a warning message should be printed by the compiler. regards Stefan Kanthak PS: I can help myself when a compiler emits wrong code. Other people might just rely on the compiler...
Reasonably Related Threads
- BUGS in code generated for target i386-win32
- Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
- Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
- Where's the optimiser gone? (part 5.c): missed tail calls, and more...
- [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock