Craig Topper via llvm-dev
2018-Nov-25 18:39 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
bswapdi2 for i386 is correct Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into eax. Those are each bswapped. The ABI for the return is edx contains bits [63:32] and eax contains [31:0]. This is opposite of how the register were loaded. ~Craig On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> wrote:> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned > long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc. > > Not sure about the bswapdi2 i386 case. > > > ~Craig > > > On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi @ll, >> >> targetting i386, LLVM/clang generates wrong code for the following >> functions: >> >> unsigned long __bswapsi2 (unsigned long ul) >> { >> return (((ul) & 0xff000000ul) >> 3 * 8) >> | (((ul) & 0x00ff0000ul) >> 8) >> | (((ul) & 0x0000ff00ul) << 8) >> | (((ul) & 0x000000fful) << 3 * 8); >> } >> >> unsigned long long __bswapdi2(unsigned long long ull) >> { >> return ((ull & 0xff00000000000000ull) >> 7 * 8) >> | ((ull & 0x00ff000000000000ull) >> 5 * 8) >> | ((ull & 0x0000ff0000000000ull) >> 3 * 8) >> | ((ull & 0x000000ff00000000ull) >> 8) >> | ((ull & 0x00000000ff000000ull) << 8) >> | ((ull & 0x0000000000ff0000ull) << 3 * 8) >> | ((ull & 0x000000000000ff00ull) << 5 * 8) >> | ((ull & 0x00000000000000ffull) << 7 * 8); >> } >> >> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c" >> and "compiler-rt/lib/builtins/bswapdi2.c", for example! >> >> >> Compiled with "-O3 -target i386" this yields the following code >> (see <https://godbolt.org/z/F4UIl4>): >> >> __bswapsi2: # @__bswapsi2 >> push ebp >> mov ebp, esp >> mov eax, dword ptr [ebp + 8] >> bswap eax >> pop ebp >> ret >> >> __bswapdi2: # @__bswapdi2 >> push ebp >> mov ebp, esp >> mov edx, dword ptr [ebp + 8] >> mov eax, dword ptr [ebp + 12] >> bswap eax >> bswap edx >> pop ebp >> ret >> >> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the >> halves of a "long long" is OBVIOUSLY WRONG! >> >> From the C source, the expected result for the input value >> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but >> produces 0x67452301EFCDAB89 >> >> >> And compiled for x86-64 this yields the following code (see >> <https://godbolt.org/z/uM9nvN>): >> >> __bswapsi2: # @__bswapsi2 >> mov eax, edi >> shr eax, 24 >> mov rcx, rdi >> shr rcx, 8 >> and ecx, 65280 >> or rax, rcx >> mov rcx, rdi >> shl rcx, 8 >> and ecx, 16711680 >> or rax, rcx >> and rdi, 255 >> shl rdi, 24 >> or rax, rdi >> ret >> >> __bswapdi2: # @__bswapdi2 >> bswap rdi >> mov rax, rdi >> ret >> >> Both are correct, but __bswapsi2() should of course use BSWAP too! >> >> >> Stefan Kanthak >> >> PS: for comparision with another compiler, take a look at >> <https://skanthak.homepage.t-online.de/msvc.html#example5> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/f8fa6869/attachment.html>
Stefan Kanthak via llvm-dev
2018-Nov-25 19:38 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
"Craig Topper" <craig.topper at gmail.com> wrote:> bswapdi2 for i386 is correctOUCH!> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into > eax. Those are each bswapped.This exchanges the high byte of each 32-bit PART with its low byte, but NOT the high byte of the whole 64-bit operand with its low byte! Please get a clue!> The ABI for the return is edx contains bits [63:32] and eax contains > [31:0]. This is opposite of how the register were loaded.My post is NOT about swapping EDX with EAX, but the bytes WITHIN both. With the 64-bit argument loaded into EDX:EAX, the instruction sequence bswap edx bswap eax xchg eax, edx is NOT equivalent to bswap rdi with the 64-bit argument loaded into RDI. Just run the following code on x86-64: mov rdi, 0123456789abcdefh ; pass (fake) argument in RDI ; split argument into high and low part mov rdx, rdi shr rdx, 32 ; high part in EDX mov eax, rdi ; low part in EAX ; perform __bswapdi2() as in 32-bit mode xchg eax, edx ; swap parts, argument now loaded ; like in 32-bit mode bswap edx bswap eax ; result like that in 32-bit mode ; load result into 64-bit register shl rdx, 32 or rax, rdx ; perform _bswapdi2() in native 64-bit mode bswap rdi ; compare results xor rax, rdi not amused Stefan Kanthak> On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> > wrote: > >> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned >> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc. >> >> Not sure about the bswapdi2 i386 case. >> >> >> ~Craig >> >> >> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hi @ll, >>> >>> targetting i386, LLVM/clang generates wrong code for the following >>> functions: >>> >>> unsigned long __bswapsi2 (unsigned long ul) >>> { >>> return (((ul) & 0xff000000ul) >> 3 * 8) >>> | (((ul) & 0x00ff0000ul) >> 8) >>> | (((ul) & 0x0000ff00ul) << 8) >>> | (((ul) & 0x000000fful) << 3 * 8); >>> } >>> >>> unsigned long long __bswapdi2(unsigned long long ull) >>> { >>> return ((ull & 0xff00000000000000ull) >> 7 * 8) >>> | ((ull & 0x00ff000000000000ull) >> 5 * 8) >>> | ((ull & 0x0000ff0000000000ull) >> 3 * 8) >>> | ((ull & 0x000000ff00000000ull) >> 8) >>> | ((ull & 0x00000000ff000000ull) << 8) >>> | ((ull & 0x0000000000ff0000ull) << 3 * 8) >>> | ((ull & 0x000000000000ff00ull) << 5 * 8) >>> | ((ull & 0x00000000000000ffull) << 7 * 8); >>> } >>> >>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c" >>> and "compiler-rt/lib/builtins/bswapdi2.c", for example! >>> >>> >>> Compiled with "-O3 -target i386" this yields the following code >>> (see <https://godbolt.org/z/F4UIl4>): >>> >>> __bswapsi2: # @__bswapsi2 >>> push ebp >>> mov ebp, esp >>> mov eax, dword ptr [ebp + 8] >>> bswap eax >>> pop ebp >>> ret >>> >>> __bswapdi2: # @__bswapdi2 >>> push ebp >>> mov ebp, esp >>> mov edx, dword ptr [ebp + 8] >>> mov eax, dword ptr [ebp + 12] >>> bswap eax >>> bswap edx >>> pop ebp >>> ret >>> >>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the >>> halves of a "long long" is OBVIOUSLY WRONG! >>> >>> From the C source, the expected result for the input value >>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but >>> produces 0x67452301EFCDAB89 >>> >>> >>> And compiled for x86-64 this yields the following code (see >>> <https://godbolt.org/z/uM9nvN>): >>> >>> __bswapsi2: # @__bswapsi2 >>> mov eax, edi >>> shr eax, 24 >>> mov rcx, rdi >>> shr rcx, 8 >>> and ecx, 65280 >>> or rax, rcx >>> mov rcx, rdi >>> shl rcx, 8 >>> and ecx, 16711680 >>> or rax, rcx >>> and rdi, 255 >>> shl rdi, 24 >>> or rax, rdi >>> ret >>> >>> __bswapdi2: # @__bswapdi2 >>> bswap rdi >>> mov rax, rdi >>> ret >>> >>> Both are correct, but __bswapsi2() should of course use BSWAP too! >>> >>> >>> Stefan Kanthak >>> >>> PS: for comparision with another compiler, take a look at >>> <https://skanthak.homepage.t-online.de/msvc.html#example5> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>
Craig Topper via llvm-dev
2018-Nov-25 20:26 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
I just compiled the two attached files in 32-bit mode and ran it. It printed efcdab8967452301. I verified via objdump that the my_bswap function contains the follow assembly which I believe matches the assembly you linked to on godbolt. _my_bswap: 1f70: 55 pushl %ebp 1f71: 89 e5 movl %esp, %ebp 1f73: 8b 55 08 movl 8(%ebp), %edx 1f76: 8b 45 0c movl 12(%ebp), %eax 1f79: 0f c8 bswapl %eax 1f7b: 0f ca bswapl %edx 1f7d: 5d popl %ebp 1f7e: c3 retl ~Craig On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <stefan.kanthak at nexgo.de> wrote:> "Craig Topper" <craig.topper at gmail.com> wrote: > > > bswapdi2 for i386 is correct > > OUCH! > > > Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into > > eax. Those are each bswapped. > > This exchanges the high byte of each 32-bit PART with its low byte, but > NOT the high byte of the whole 64-bit operand with its low byte! > > Please get a clue! > > > The ABI for the return is edx contains bits [63:32] and eax contains > > [31:0]. This is opposite of how the register were loaded. > > My post is NOT about swapping EDX with EAX, but the bytes WITHIN both. > > With the 64-bit argument loaded into EDX:EAX, the instruction sequence > > bswap edx > bswap eax > xchg eax, edx > > is NOT equivalent to > > bswap rdi > > with the 64-bit argument loaded into RDI. > > Just run the following code on x86-64: > > mov rdi, 0123456789abcdefh ; pass (fake) argument in RDI > ; split argument into high and low part > mov rdx, rdi > shr rdx, 32 ; high part in EDX > mov eax, rdi ; low part in EAX > ; perform __bswapdi2() as in 32-bit mode > xchg eax, edx ; swap parts, argument now loaded > ; like in 32-bit mode > bswap edx > bswap eax ; result like that in 32-bit mode > ; load result into 64-bit register > shl rdx, 32 > or rax, rdx > ; perform _bswapdi2() in native 64-bit mode > bswap rdi > ; compare results > xor rax, rdi > > not amused > Stefan Kanthak > > > On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> > > wrote: > > > >> bswapsi2 on the x86-64 isn't using the bswap instruction because > "unsigned > >> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc. > >> > >> Not sure about the bswapdi2 i386 case. > >> > >> > >> ~Craig > >> > >> > >> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev < > >> llvm-dev at lists.llvm.org> wrote: > >> > >>> Hi @ll, > >>> > >>> targetting i386, LLVM/clang generates wrong code for the following > >>> functions: > >>> > >>> unsigned long __bswapsi2 (unsigned long ul) > >>> { > >>> return (((ul) & 0xff000000ul) >> 3 * 8) > >>> | (((ul) & 0x00ff0000ul) >> 8) > >>> | (((ul) & 0x0000ff00ul) << 8) > >>> | (((ul) & 0x000000fful) << 3 * 8); > >>> } > >>> > >>> unsigned long long __bswapdi2(unsigned long long ull) > >>> { > >>> return ((ull & 0xff00000000000000ull) >> 7 * 8) > >>> | ((ull & 0x00ff000000000000ull) >> 5 * 8) > >>> | ((ull & 0x0000ff0000000000ull) >> 3 * 8) > >>> | ((ull & 0x000000ff00000000ull) >> 8) > >>> | ((ull & 0x00000000ff000000ull) << 8) > >>> | ((ull & 0x0000000000ff0000ull) << 3 * 8) > >>> | ((ull & 0x000000000000ff00ull) << 5 * 8) > >>> | ((ull & 0x00000000000000ffull) << 7 * 8); > >>> } > >>> > >>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c" > >>> and "compiler-rt/lib/builtins/bswapdi2.c", for example! > >>> > >>> > >>> Compiled with "-O3 -target i386" this yields the following code > >>> (see <https://godbolt.org/z/F4UIl4>): > >>> > >>> __bswapsi2: # @__bswapsi2 > >>> push ebp > >>> mov ebp, esp > >>> mov eax, dword ptr [ebp + 8] > >>> bswap eax > >>> pop ebp > >>> ret > >>> > >>> __bswapdi2: # @__bswapdi2 > >>> push ebp > >>> mov ebp, esp > >>> mov edx, dword ptr [ebp + 8] > >>> mov eax, dword ptr [ebp + 12] > >>> bswap eax > >>> bswap edx > >>> pop ebp > >>> ret > >>> > >>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the > >>> halves of a "long long" is OBVIOUSLY WRONG! > >>> > >>> From the C source, the expected result for the input value > >>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but > >>> produces 0x67452301EFCDAB89 > >>> > >>> > >>> And compiled for x86-64 this yields the following code (see > >>> <https://godbolt.org/z/uM9nvN>): > >>> > >>> __bswapsi2: # @__bswapsi2 > >>> mov eax, edi > >>> shr eax, 24 > >>> mov rcx, rdi > >>> shr rcx, 8 > >>> and ecx, 65280 > >>> or rax, rcx > >>> mov rcx, rdi > >>> shl rcx, 8 > >>> and ecx, 16711680 > >>> or rax, rcx > >>> and rdi, 255 > >>> shl rdi, 24 > >>> or rax, rdi > >>> ret > >>> > >>> __bswapdi2: # @__bswapdi2 > >>> bswap rdi > >>> mov rax, rdi > >>> ret > >>> > >>> Both are correct, but __bswapsi2() should of course use BSWAP too! > >>> > >>> > >>> Stefan Kanthak > >>> > >>> PS: for comparision with another compiler, take a look at > >>> <https://skanthak.homepage.t-online.de/msvc.html#example5> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> llvm-dev at lists.llvm.org > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> > >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: application/octet-stream Size: 186 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: my_bswap.c Type: application/octet-stream Size: 473 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment-0001.obj>
Fabian Giesen via llvm-dev
2018-Nov-25 21:38 UTC
[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
On 11/25/2018 11:38 AM, Stefan Kanthak via llvm-dev wrote:> "Craig Topper" <craig.topper at gmail.com> wrote: > >> bswapdi2 for i386 is correct > > OUCH! > >> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into >> eax. Those are each bswapped. > > This exchanges the high byte of each 32-bit PART with its low byte, but > NOT the high byte of the whole 64-bit operand with its low byte!Incorrect. This reverses the bytes within the low and high 32-bit halves, and also swaps the two halves, which _is_ equivalent to byte-reversing the entire 64-bit value. It's a classic divide-and-conquer approach: one way to reverse a sequence is to cut it into two contiguous pieces (doesn't matter how; in this case, split at the halfway point), reverse the two pieces individually (the two BSWAPs), and then combine the two pieces in reverse order (accomplished implicitly by loading the original top half into eax and the bottom half into edx).> Please get a clue!Throwing insults around will not help you get your problem resolved any quicker, it will just get you ignored.> Just run the following code on x86-64: > > mov rdi, 0123456789abcdefh ; pass (fake) argument in RDI > ; split argument into high and low part > mov rdx, rdi > shr rdx, 32 ; high part in EDX > mov eax, rdi ; low part in EAX > ; perform __bswapdi2() as in 32-bit mode > xchg eax, edx ; swap parts, argument now loaded > ; like in 32-bit mode > bswap edx > bswap eax ; result like that in 32-bit mode > ; load result into 64-bit register > shl rdx, 32 > or rax, rdx > ; perform _bswapdi2() in native 64-bit mode > bswap rdi > ; compare results > xor rax, rdiSyntax error aside (that should be "mov eax, edi" not "mov eax, rdi" up there), I get: (gdb) set disassembly-flavor intel (gdb) x/20i main 0x4004a0 <main>: movabs rdi,0x123456789abcdef 0x4004aa <main+10>: mov rdx,rdi 0x4004ad <main+13>: shr rdx,0x20 0x4004b1 <main+17>: mov eax,edi 0x4004b3 <main+19>: xchg edx,eax 0x4004b4 <main+20>: bswap edx 0x4004b6 <main+22>: bswap eax 0x4004b8 <main+24>: shl rdx,0x20 0x4004bc <main+28>: or rax,rdx 0x4004bf <main+31>: bswap rdi 0x4004c2 <main+34>: xor rax,rdi 0x4004c5 <main+37>: int3 0x4004c6 <main+38>: nop WORD PTR cs:[rax+rax*1+0x0] 0x4004d0 <__libc_csu_init>: push r15 0x4004d2 <__libc_csu_init+2>: mov r15,rdx 0x4004d5 <__libc_csu_init+5>: push r14 0x4004d7 <__libc_csu_init+7>: mov r14,rsi 0x4004da <__libc_csu_init+10>: push r13 0x4004dc <__libc_csu_init+12>: mov r13d,edi 0x4004df <__libc_csu_init+15>: push r12 (gdb) r Starting program: /home/fabiang/code/bswap/bswap Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000004004c6 in main () (gdb) x/i $pc => 0x4004c6 <main+38>: nop WORD PTR cs:[rax+rax*1+0x0] (gdb) p $rax $1 = 0 exactly as it should be. So your point is? -Fabian