Craig Topper via llvm-dev
2018-Nov-25 20:26 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
I just compiled the two attached files in 32-bit mode and ran it. It printed efcdab8967452301. I verified via objdump that the my_bswap function contains the follow assembly which I believe matches the assembly you linked to on godbolt. _my_bswap: 1f70: 55 pushl %ebp 1f71: 89 e5 movl %esp, %ebp 1f73: 8b 55 08 movl 8(%ebp), %edx 1f76: 8b 45 0c movl 12(%ebp), %eax 1f79: 0f c8 bswapl %eax 1f7b: 0f ca bswapl %edx 1f7d: 5d popl %ebp 1f7e: c3 retl ~Craig On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <stefan.kanthak at nexgo.de> wrote:> "Craig Topper" <craig.topper at gmail.com> wrote: > > > bswapdi2 for i386 is correct > > OUCH! > > > Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into > > eax. Those are each bswapped. > > This exchanges the high byte of each 32-bit PART with its low byte, but > NOT the high byte of the whole 64-bit operand with its low byte! > > Please get a clue! > > > The ABI for the return is edx contains bits [63:32] and eax contains > > [31:0]. This is opposite of how the register were loaded. > > My post is NOT about swapping EDX with EAX, but the bytes WITHIN both. > > With the 64-bit argument loaded into EDX:EAX, the instruction sequence > > bswap edx > bswap eax > xchg eax, edx > > is NOT equivalent to > > bswap rdi > > with the 64-bit argument loaded into RDI. > > Just run the following code on x86-64: > > mov rdi, 0123456789abcdefh ; pass (fake) argument in RDI > ; split argument into high and low part > mov rdx, rdi > shr rdx, 32 ; high part in EDX > mov eax, rdi ; low part in EAX > ; perform __bswapdi2() as in 32-bit mode > xchg eax, edx ; swap parts, argument now loaded > ; like in 32-bit mode > bswap edx > bswap eax ; result like that in 32-bit mode > ; load result into 64-bit register > shl rdx, 32 > or rax, rdx > ; perform _bswapdi2() in native 64-bit mode > bswap rdi > ; compare results > xor rax, rdi > > not amused > Stefan Kanthak > > > On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> > > wrote: > > > >> bswapsi2 on the x86-64 isn't using the bswap instruction because > "unsigned > >> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc. > >> > >> Not sure about the bswapdi2 i386 case. > >> > >> > >> ~Craig > >> > >> > >> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev < > >> llvm-dev at lists.llvm.org> wrote: > >> > >>> Hi @ll, > >>> > >>> targetting i386, LLVM/clang generates wrong code for the following > >>> functions: > >>> > >>> unsigned long __bswapsi2 (unsigned long ul) > >>> { > >>> return (((ul) & 0xff000000ul) >> 3 * 8) > >>> | (((ul) & 0x00ff0000ul) >> 8) > >>> | (((ul) & 0x0000ff00ul) << 8) > >>> | (((ul) & 0x000000fful) << 3 * 8); > >>> } > >>> > >>> unsigned long long __bswapdi2(unsigned long long ull) > >>> { > >>> return ((ull & 0xff00000000000000ull) >> 7 * 8) > >>> | ((ull & 0x00ff000000000000ull) >> 5 * 8) > >>> | ((ull & 0x0000ff0000000000ull) >> 3 * 8) > >>> | ((ull & 0x000000ff00000000ull) >> 8) > >>> | ((ull & 0x00000000ff000000ull) << 8) > >>> | ((ull & 0x0000000000ff0000ull) << 3 * 8) > >>> | ((ull & 0x000000000000ff00ull) << 5 * 8) > >>> | ((ull & 0x00000000000000ffull) << 7 * 8); > >>> } > >>> > >>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c" > >>> and "compiler-rt/lib/builtins/bswapdi2.c", for example! > >>> > >>> > >>> Compiled with "-O3 -target i386" this yields the following code > >>> (see <https://godbolt.org/z/F4UIl4>): > >>> > >>> __bswapsi2: # @__bswapsi2 > >>> push ebp > >>> mov ebp, esp > >>> mov eax, dword ptr [ebp + 8] > >>> bswap eax > >>> pop ebp > >>> ret > >>> > >>> __bswapdi2: # @__bswapdi2 > >>> push ebp > >>> mov ebp, esp > >>> mov edx, dword ptr [ebp + 8] > >>> mov eax, dword ptr [ebp + 12] > >>> bswap eax > >>> bswap edx > >>> pop ebp > >>> ret > >>> > >>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the > >>> halves of a "long long" is OBVIOUSLY WRONG! > >>> > >>> From the C source, the expected result for the input value > >>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but > >>> produces 0x67452301EFCDAB89 > >>> > >>> > >>> And compiled for x86-64 this yields the following code (see > >>> <https://godbolt.org/z/uM9nvN>): > >>> > >>> __bswapsi2: # @__bswapsi2 > >>> mov eax, edi > >>> shr eax, 24 > >>> mov rcx, rdi > >>> shr rcx, 8 > >>> and ecx, 65280 > >>> or rax, rcx > >>> mov rcx, rdi > >>> shl rcx, 8 > >>> and ecx, 16711680 > >>> or rax, rcx > >>> and rdi, 255 > >>> shl rdi, 24 > >>> or rax, rdi > >>> ret > >>> > >>> __bswapdi2: # @__bswapdi2 > >>> bswap rdi > >>> mov rax, rdi > >>> ret > >>> > >>> Both are correct, but __bswapsi2() should of course use BSWAP too! > >>> > >>> > >>> Stefan Kanthak > >>> > >>> PS: for comparision with another compiler, take a look at > >>> <https://skanthak.homepage.t-online.de/msvc.html#example5> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: application/octet-stream Size: 186 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: my_bswap.c Type: application/octet-stream Size: 473 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment-0001.obj>
Stefan Kanthak via llvm-dev
2018-Nov-26 12:11 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
"Craig Topper" <craig.topper at gmail.com> wrote:>I just compiled the two attached files in 32-bit mode and ran it. > > It printed efcdab8967452301. > > I verified via objdump that the my_bswap function contains the follow > assembly which I believe matches the assembly you linked to on godbolt. > > _my_bswap: > 1f70: 55 pushl %ebp > 1f71: 89 e5 movl %esp, %ebp > 1f73: 8b 55 08 movl 8(%ebp), %edx > 1f76: 8b 45 0c movl 12(%ebp), %eax > 1f79: 0f c8 bswapl %eax > 1f7b: 0f ca bswapl %edx > 1f7d: 5d popl %ebp > 1f7e: c3 retlContrary to my initial WRONG claim this code is CORRECT: two 32-bit BSWAP instructions on the swapped halves of a 64-bit register are equivalent to a 64-bit BSWAP instruction. Sorry for the confusion, I should have slept another night before responding. The other part of my post but holds: the following function, compiled for x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>): unsigned long __bswapsi2 (unsigned long ul) { return (ul >> 3 * 8) & 0xff000000ul | (ul >> 8) & 0x00ff0000ul | (ul << 8) & 0x0000ff00ul | (ul << 3 * 8) & 0x000000fful; } The expected optimized code is __bswapsi2: # @__bswapsi2 bswap edi mov eax, edi ret regards Stefan Kanthak [ fullquote removed ]
Fabian Giesen via llvm-dev
2018-Nov-26 19:11 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
On 11/26/2018 4:11 AM, Stefan Kanthak via llvm-dev wrote:> The other part of my post but holds: the following function, compiled for > x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>): > > unsigned long __bswapsi2 (unsigned long ul) > { > return (ul >> 3 * 8) & 0xff000000ul > | (ul >> 8) & 0x00ff0000ul > | (ul << 8) & 0x0000ff00ul > | (ul << 3 * 8) & 0x000000fful; > }That looks like another instance of truncation in the wrong places causing the pattern matches to fail. There have been a couple of those in the past with other things like rotates. This is indeed a bug, but if you need a quick workaround, the pattern does work when you use a 32-bit (e.g. unsigned int or uint32_t) type. -Fabian