thr3ads.net - llvm dev - [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling _

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2018-Nov-25 18:39 UTC

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

bswapdi2 for i386 is correct

Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
eax. Those are each bswapped. The ABI for the return is edx contains bits
[63:32] and eax contains [31:0]. This is opposite of how the register were
loaded.

~Craig


On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com>
wrote:
> bswapsi2 on the x86-64 isn't using the bswap instruction because
"unsigned
> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.
>
> Not sure about the bswapdi2 i386 case.
>
>
> ~Craig
>
>
> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi @ll,
>>
>> targetting i386, LLVM/clang generates wrong code for the following
>> functions:
>>
>> unsigned long __bswapsi2 (unsigned long ul)
>> {
>>     return (((ul) & 0xff000000ul) >> 3 * 8)
>>          | (((ul) & 0x00ff0000ul) >>     8)
>>          | (((ul) & 0x0000ff00ul) <<     8)
>>          | (((ul) & 0x000000fful) << 3 * 8);
>> }
>>
>> unsigned long long __bswapdi2(unsigned long long ull)
>> {
>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
>>          | ((ull & 0x000000ff00000000ull) >>     8)
>>          | ((ull & 0x00000000ff000000ull) <<     8)
>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
>> }
>>
>> You can find these sources in
"compiler-rt/lib/builtins/bswapsi2.c"
>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!
>>
>>
>> Compiled with "-O3 -target i386" this yields the following
code
>> (see <https://godbolt.org/z/F4UIl4>):
>>
>> __bswapsi2: # @__bswapsi2
>>     push  ebp
>>     mov   ebp, esp
>>     mov   eax, dword ptr [ebp + 8]
>>     bswap eax
>>     pop   ebp
>>     ret
>>
>> __bswapdi2: # @__bswapdi2
>>     push  ebp
>>     mov   ebp, esp
>>     mov   edx, dword ptr [ebp + 8]
>>     mov   eax, dword ptr [ebp + 12]
>>     bswap eax
>>     bswap edx
>>     pop   ebp
>>     ret
>>
>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
>> halves of a "long long" is OBVIOUSLY WRONG!
>>
>> From the C source, the expected result for the input value
>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
>> produces 0x67452301EFCDAB89
>>
>>
>> And compiled for x86-64 this yields the following code (see
>> <https://godbolt.org/z/uM9nvN>):
>>
>> __bswapsi2: # @__bswapsi2
>>     mov   eax, edi
>>     shr   eax, 24
>>     mov   rcx, rdi
>>     shr   rcx, 8
>>     and   ecx, 65280
>>     or    rax, rcx
>>     mov   rcx, rdi
>>     shl   rcx, 8
>>     and   ecx, 16711680
>>     or    rax, rcx
>>     and   rdi, 255
>>     shl   rdi, 24
>>     or    rax, rdi
>>     ret
>>
>> __bswapdi2: # @__bswapdi2
>>     bswap rdi
>>     mov   rax, rdi
>>     ret
>>
>> Both are correct, but __bswapsi2() should of course use BSWAP too!
>>
>>
>> Stefan Kanthak
>>
>> PS: for comparision with another compiler, take a look at
>>     <https://skanthak.homepage.t-online.de/msvc.html#example5>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/f8fa6869/attachment.html>

Stefan Kanthak via llvm-dev

2018-Nov-25 19:38 UTC

head link

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

"Craig Topper" <craig.topper at gmail.com> wrote:
> bswapdi2 for i386 is correct
OUCH!
> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
> eax. Those are each bswapped.
This exchanges the high byte of each 32-bit PART with its low byte, but
NOT the high byte of the whole 64-bit operand with its low byte!

Please get a clue!
> The ABI for the return is edx contains bits [63:32] and eax contains
> [31:0]. This is opposite of how the register were loaded.
My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.

With the 64-bit argument loaded into EDX:EAX, the instruction sequence

    bswap  edx
    bswap  eax
    xchg   eax, edx

is NOT equivalent to

    bswap    rdi

with the 64-bit argument loaded into RDI.

Just run the following code on x86-64:

    mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
; split argument into high and low part
    mov    rdx, rdi
    shr    rdx, 32                   ; high part in EDX
    mov    eax, rdi                  ; low part in EAX
; perform __bswapdi2() as in 32-bit mode
    xchg   eax, edx                  ; swap parts, argument now loaded
                                     ;  like in 32-bit mode
    bswap  edx
    bswap  eax                       ; result like that in 32-bit mode
; load result into 64-bit register
    shl    rdx, 32
    or     rax, rdx
; perform _bswapdi2() in native 64-bit mode
    bswap  rdi
; compare results
    xor    rax, rdi

not amused
Stefan Kanthak
> On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at
gmail.com>
> wrote:
> 
>> bswapsi2 on the x86-64 isn't using the bswap instruction because
"unsigned
>> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.
>>
>> Not sure about the bswapdi2 i386 case.
>>
>>
>> ~Craig
>>
>>
>> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi @ll,
>>>
>>> targetting i386, LLVM/clang generates wrong code for the following
>>> functions:
>>>
>>> unsigned long __bswapsi2 (unsigned long ul)
>>> {
>>>     return (((ul) & 0xff000000ul) >> 3 * 8)
>>>          | (((ul) & 0x00ff0000ul) >>     8)
>>>          | (((ul) & 0x0000ff00ul) <<     8)
>>>          | (((ul) & 0x000000fful) << 3 * 8);
>>> }
>>>
>>> unsigned long long __bswapdi2(unsigned long long ull)
>>> {
>>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
>>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
>>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
>>>          | ((ull & 0x000000ff00000000ull) >>     8)
>>>          | ((ull & 0x00000000ff000000ull) <<     8)
>>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
>>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
>>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
>>> }
>>>
>>> You can find these sources in
"compiler-rt/lib/builtins/bswapsi2.c"
>>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!
>>>
>>>
>>> Compiled with "-O3 -target i386" this yields the
following code
>>> (see <https://godbolt.org/z/F4UIl4>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   eax, dword ptr [ebp + 8]
>>>     bswap eax
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   edx, dword ptr [ebp + 8]
>>>     mov   eax, dword ptr [ebp + 12]
>>>     bswap eax
>>>     bswap edx
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
>>> halves of a "long long" is OBVIOUSLY WRONG!
>>>
>>> From the C source, the expected result for the input value
>>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
>>> produces 0x67452301EFCDAB89
>>>
>>>
>>> And compiled for x86-64 this yields the following code (see
>>> <https://godbolt.org/z/uM9nvN>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     mov   eax, edi
>>>     shr   eax, 24
>>>     mov   rcx, rdi
>>>     shr   rcx, 8
>>>     and   ecx, 65280
>>>     or    rax, rcx
>>>     mov   rcx, rdi
>>>     shl   rcx, 8
>>>     and   ecx, 16711680
>>>     or    rax, rcx
>>>     and   rdi, 255
>>>     shl   rdi, 24
>>>     or    rax, rdi
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     bswap rdi
>>>     mov   rax, rdi
>>>     ret
>>>
>>> Both are correct, but __bswapsi2() should of course use BSWAP too!
>>>
>>>
>>> Stefan Kanthak
>>>
>>> PS: for comparision with another compiler, take a look at
>>>    
<https://skanthak.homepage.t-online.de/msvc.html#example5>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>

Craig Topper via llvm-dev

2018-Nov-25 20:26 UTC

head link

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

I just compiled the two attached files in 32-bit mode and ran it.

It printed efcdab8967452301.

I verified via objdump that the my_bswap function contains the follow
assembly which I believe matches the assembly you linked to on godbolt.

_my_bswap:
    1f70: 55 pushl %ebp
    1f71: 89 e5 movl %esp, %ebp
    1f73: 8b 55 08 movl 8(%ebp), %edx
    1f76: 8b 45 0c movl 12(%ebp), %eax
    1f79: 0f c8 bswapl %eax
    1f7b: 0f ca bswapl %edx
    1f7d: 5d popl %ebp
    1f7e: c3 retl


~Craig


On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <stefan.kanthak at
nexgo.de>
wrote:
> "Craig Topper" <craig.topper at gmail.com> wrote:
>
> > bswapdi2 for i386 is correct
>
> OUCH!
>
> > Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded
into
> > eax. Those are each bswapped.
>
> This exchanges the high byte of each 32-bit PART with its low byte, but
> NOT the high byte of the whole 64-bit operand with its low byte!
>
> Please get a clue!
>
> > The ABI for the return is edx contains bits [63:32] and eax contains
> > [31:0]. This is opposite of how the register were loaded.
>
> My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.
>
> With the 64-bit argument loaded into EDX:EAX, the instruction sequence
>
>     bswap  edx
>     bswap  eax
>     xchg   eax, edx
>
> is NOT equivalent to
>
>     bswap    rdi
>
> with the 64-bit argument loaded into RDI.
>
> Just run the following code on x86-64:
>
>     mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
> ; split argument into high and low part
>     mov    rdx, rdi
>     shr    rdx, 32                   ; high part in EDX
>     mov    eax, rdi                  ; low part in EAX
> ; perform __bswapdi2() as in 32-bit mode
>     xchg   eax, edx                  ; swap parts, argument now loaded
>                                      ;  like in 32-bit mode
>     bswap  edx
>     bswap  eax                       ; result like that in 32-bit mode
> ; load result into 64-bit register
>     shl    rdx, 32
>     or     rax, rdx
> ; perform _bswapdi2() in native 64-bit mode
>     bswap  rdi
> ; compare results
>     xor    rax, rdi
>
> not amused
> Stefan Kanthak
>
> > On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at
gmail.com>
> > wrote:
> >
> >> bswapsi2 on the x86-64 isn't using the bswap instruction
because
> "unsigned
> >> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64
msvc.
> >>
> >> Not sure about the bswapdi2 i386 case.
> >>
> >>
> >> ~Craig
> >>
> >>
> >> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
> >> llvm-dev at lists.llvm.org> wrote:
> >>
> >>> Hi @ll,
> >>>
> >>> targetting i386, LLVM/clang generates wrong code for the
following
> >>> functions:
> >>>
> >>> unsigned long __bswapsi2 (unsigned long ul)
> >>> {
> >>>     return (((ul) & 0xff000000ul) >> 3 * 8)
> >>>          | (((ul) & 0x00ff0000ul) >>     8)
> >>>          | (((ul) & 0x0000ff00ul) <<     8)
> >>>          | (((ul) & 0x000000fful) << 3 * 8);
> >>> }
> >>>
> >>> unsigned long long __bswapdi2(unsigned long long ull)
> >>> {
> >>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
> >>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
> >>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
> >>>          | ((ull & 0x000000ff00000000ull) >>     8)
> >>>          | ((ull & 0x00000000ff000000ull) <<     8)
> >>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
> >>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
> >>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
> >>> }
> >>>
> >>> You can find these sources in
"compiler-rt/lib/builtins/bswapsi2.c"
> >>> and "compiler-rt/lib/builtins/bswapdi2.c", for
example!
> >>>
> >>>
> >>> Compiled with "-O3 -target i386" this yields the
following code
> >>> (see <https://godbolt.org/z/F4UIl4>):
> >>>
> >>> __bswapsi2: # @__bswapsi2
> >>>     push  ebp
> >>>     mov   ebp, esp
> >>>     mov   eax, dword ptr [ebp + 8]
> >>>     bswap eax
> >>>     pop   ebp
> >>>     ret
> >>>
> >>> __bswapdi2: # @__bswapdi2
> >>>     push  ebp
> >>>     mov   ebp, esp
> >>>     mov   edx, dword ptr [ebp + 8]
> >>>     mov   eax, dword ptr [ebp + 12]
> >>>     bswap eax
> >>>     bswap edx
> >>>     pop   ebp
> >>>     ret
> >>>
> >>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just
the
> >>> halves of a "long long" is OBVIOUSLY WRONG!
> >>>
> >>> From the C source, the expected result for the input value
> >>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code
but
> >>> produces 0x67452301EFCDAB89
> >>>
> >>>
> >>> And compiled for x86-64 this yields the following code (see
> >>> <https://godbolt.org/z/uM9nvN>):
> >>>
> >>> __bswapsi2: # @__bswapsi2
> >>>     mov   eax, edi
> >>>     shr   eax, 24
> >>>     mov   rcx, rdi
> >>>     shr   rcx, 8
> >>>     and   ecx, 65280
> >>>     or    rax, rcx
> >>>     mov   rcx, rdi
> >>>     shl   rcx, 8
> >>>     and   ecx, 16711680
> >>>     or    rax, rcx
> >>>     and   rdi, 255
> >>>     shl   rdi, 24
> >>>     or    rax, rdi
> >>>     ret
> >>>
> >>> __bswapdi2: # @__bswapdi2
> >>>     bswap rdi
> >>>     mov   rax, rdi
> >>>     ret
> >>>
> >>> Both are correct, but __bswapsi2() should of course use BSWAP
too!
> >>>
> >>>
> >>> Stefan Kanthak
> >>>
> >>> PS: for comparision with another compiler, take a look at
> >>>    
<https://skanthak.homepage.t-online.de/msvc.html#example5>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: application/octet-stream
Size: 186 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_bswap.c
Type: application/octet-stream
Size: 473 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment-0001.obj>

Fabian Giesen via llvm-dev

2018-Nov-25 21:38 UTC

head link

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

On 11/25/2018 11:38 AM, Stefan Kanthak via llvm-dev
wrote:> "Craig Topper" <craig.topper at gmail.com> wrote:
> 
>> bswapdi2 for i386 is correct
> 
> OUCH!
> 
>> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
>> eax. Those are each bswapped.
> 
> This exchanges the high byte of each 32-bit PART with its low byte, but
> NOT the high byte of the whole 64-bit operand with its low byte!
Incorrect.

This reverses the bytes within the low and high 32-bit halves, and also 
swaps the two halves, which _is_ equivalent to byte-reversing the entire 
64-bit value.

It's a classic divide-and-conquer approach: one way to reverse a 
sequence is to cut it into two contiguous pieces (doesn't matter how; in 
this case, split at the halfway point), reverse the two pieces 
individually (the two BSWAPs), and then combine the two pieces in 
reverse order (accomplished implicitly by loading the original top half 
into eax and the bottom half into edx).
> Please get a clue!
Throwing insults around will not help you get your problem resolved any 
quicker, it will just get you ignored.
> Just run the following code on x86-64:
> 
>      mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
> ; split argument into high and low part
>      mov    rdx, rdi
>      shr    rdx, 32                   ; high part in EDX
>      mov    eax, rdi                  ; low part in EAX
> ; perform __bswapdi2() as in 32-bit mode
>      xchg   eax, edx                  ; swap parts, argument now loaded
>                                       ;  like in 32-bit mode
>      bswap  edx
>      bswap  eax                       ; result like that in 32-bit mode
> ; load result into 64-bit register
>      shl    rdx, 32
>      or     rax, rdx
> ; perform _bswapdi2() in native 64-bit mode
>      bswap  rdi
> ; compare results
>      xor    rax, rdi
Syntax error aside (that should be "mov eax, edi" not "mov eax,
rdi" up
there), I get:

(gdb) set disassembly-flavor intel
(gdb) x/20i main
    0x4004a0 <main>:     movabs rdi,0x123456789abcdef
    0x4004aa <main+10>:  mov    rdx,rdi
    0x4004ad <main+13>:  shr    rdx,0x20
    0x4004b1 <main+17>:  mov    eax,edi
    0x4004b3 <main+19>:  xchg   edx,eax
    0x4004b4 <main+20>:  bswap  edx
    0x4004b6 <main+22>:  bswap  eax
    0x4004b8 <main+24>:  shl    rdx,0x20
    0x4004bc <main+28>:  or     rax,rdx
    0x4004bf <main+31>:  bswap  rdi
    0x4004c2 <main+34>:  xor    rax,rdi
    0x4004c5 <main+37>:  int3
    0x4004c6 <main+38>:  nop    WORD PTR cs:[rax+rax*1+0x0]
    0x4004d0 <__libc_csu_init>:  push   r15
    0x4004d2 <__libc_csu_init+2>:        mov    r15,rdx
    0x4004d5 <__libc_csu_init+5>:        push   r14
    0x4004d7 <__libc_csu_init+7>:        mov    r14,rsi
    0x4004da <__libc_csu_init+10>:       push   r13
    0x4004dc <__libc_csu_init+12>:       mov    r13d,edi
    0x4004df <__libc_csu_init+15>:       push   r12
(gdb) r
Starting program: /home/fabiang/code/bswap/bswap

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004004c6 in main ()
(gdb) x/i $pc
=> 0x4004c6 <main+38>:  nop    WORD PTR cs:[rax+rax*1+0x0]
(gdb) p $rax
$1 = 0

exactly as it should be. So your point is?

-Fabian

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Nov 2018 - BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Maybe Matching Threads

llvm dev - Nov 2018 - BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()