thr3ads.net - llvm dev - [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling _

If this information is useful, please help other people find it:
Share via:

Craig Topper via llvm-dev

2018-Nov-25 20:26 UTC

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

I just compiled the two attached files in 32-bit mode and ran it.

It printed efcdab8967452301.

I verified via objdump that the my_bswap function contains the follow
assembly which I believe matches the assembly you linked to on godbolt.

_my_bswap:
    1f70: 55 pushl %ebp
    1f71: 89 e5 movl %esp, %ebp
    1f73: 8b 55 08 movl 8(%ebp), %edx
    1f76: 8b 45 0c movl 12(%ebp), %eax
    1f79: 0f c8 bswapl %eax
    1f7b: 0f ca bswapl %edx
    1f7d: 5d popl %ebp
    1f7e: c3 retl


~Craig


On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <stefan.kanthak at
nexgo.de>
wrote:
> "Craig Topper" <craig.topper at gmail.com> wrote:
>
> > bswapdi2 for i386 is correct
>
> OUCH!
>
> > Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded
into
> > eax. Those are each bswapped.
>
> This exchanges the high byte of each 32-bit PART with its low byte, but
> NOT the high byte of the whole 64-bit operand with its low byte!
>
> Please get a clue!
>
> > The ABI for the return is edx contains bits [63:32] and eax contains
> > [31:0]. This is opposite of how the register were loaded.
>
> My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.
>
> With the 64-bit argument loaded into EDX:EAX, the instruction sequence
>
>     bswap  edx
>     bswap  eax
>     xchg   eax, edx
>
> is NOT equivalent to
>
>     bswap    rdi
>
> with the 64-bit argument loaded into RDI.
>
> Just run the following code on x86-64:
>
>     mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
> ; split argument into high and low part
>     mov    rdx, rdi
>     shr    rdx, 32                   ; high part in EDX
>     mov    eax, rdi                  ; low part in EAX
> ; perform __bswapdi2() as in 32-bit mode
>     xchg   eax, edx                  ; swap parts, argument now loaded
>                                      ;  like in 32-bit mode
>     bswap  edx
>     bswap  eax                       ; result like that in 32-bit mode
> ; load result into 64-bit register
>     shl    rdx, 32
>     or     rax, rdx
> ; perform _bswapdi2() in native 64-bit mode
>     bswap  rdi
> ; compare results
>     xor    rax, rdi
>
> not amused
> Stefan Kanthak
>
> > On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at
gmail.com>
> > wrote:
> >
> >> bswapsi2 on the x86-64 isn't using the bswap instruction
because
> "unsigned
> >> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64
msvc.
> >>
> >> Not sure about the bswapdi2 i386 case.
> >>
> >>
> >> ~Craig
> >>
> >>
> >> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
> >> llvm-dev at lists.llvm.org> wrote:
> >>
> >>> Hi @ll,
> >>>
> >>> targetting i386, LLVM/clang generates wrong code for the
following
> >>> functions:
> >>>
> >>> unsigned long __bswapsi2 (unsigned long ul)
> >>> {
> >>>     return (((ul) & 0xff000000ul) >> 3 * 8)
> >>>          | (((ul) & 0x00ff0000ul) >>     8)
> >>>          | (((ul) & 0x0000ff00ul) <<     8)
> >>>          | (((ul) & 0x000000fful) << 3 * 8);
> >>> }
> >>>
> >>> unsigned long long __bswapdi2(unsigned long long ull)
> >>> {
> >>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
> >>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
> >>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
> >>>          | ((ull & 0x000000ff00000000ull) >>     8)
> >>>          | ((ull & 0x00000000ff000000ull) <<     8)
> >>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
> >>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
> >>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
> >>> }
> >>>
> >>> You can find these sources in
"compiler-rt/lib/builtins/bswapsi2.c"
> >>> and "compiler-rt/lib/builtins/bswapdi2.c", for
example!
> >>>
> >>>
> >>> Compiled with "-O3 -target i386" this yields the
following code
> >>> (see <https://godbolt.org/z/F4UIl4>):
> >>>
> >>> __bswapsi2: # @__bswapsi2
> >>>     push  ebp
> >>>     mov   ebp, esp
> >>>     mov   eax, dword ptr [ebp + 8]
> >>>     bswap eax
> >>>     pop   ebp
> >>>     ret
> >>>
> >>> __bswapdi2: # @__bswapdi2
> >>>     push  ebp
> >>>     mov   ebp, esp
> >>>     mov   edx, dword ptr [ebp + 8]
> >>>     mov   eax, dword ptr [ebp + 12]
> >>>     bswap eax
> >>>     bswap edx
> >>>     pop   ebp
> >>>     ret
> >>>
> >>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just
the
> >>> halves of a "long long" is OBVIOUSLY WRONG!
> >>>
> >>> From the C source, the expected result for the input value
> >>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code
but
> >>> produces 0x67452301EFCDAB89
> >>>
> >>>
> >>> And compiled for x86-64 this yields the following code (see
> >>> <https://godbolt.org/z/uM9nvN>):
> >>>
> >>> __bswapsi2: # @__bswapsi2
> >>>     mov   eax, edi
> >>>     shr   eax, 24
> >>>     mov   rcx, rdi
> >>>     shr   rcx, 8
> >>>     and   ecx, 65280
> >>>     or    rax, rcx
> >>>     mov   rcx, rdi
> >>>     shl   rcx, 8
> >>>     and   ecx, 16711680
> >>>     or    rax, rcx
> >>>     and   rdi, 255
> >>>     shl   rdi, 24
> >>>     or    rax, rdi
> >>>     ret
> >>>
> >>> __bswapdi2: # @__bswapdi2
> >>>     bswap rdi
> >>>     mov   rax, rdi
> >>>     ret
> >>>
> >>> Both are correct, but __bswapsi2() should of course use BSWAP
too!
> >>>
> >>>
> >>> Stefan Kanthak
> >>>
> >>> PS: for comparision with another compiler, take a look at
> >>>    
<https://skanthak.homepage.t-online.de/msvc.html#example5>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: application/octet-stream
Size: 186 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_bswap.c
Type: application/octet-stream
Size: 473 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181125/c8f5b6fa/attachment-0001.obj>

Stefan Kanthak via llvm-dev

2018-Nov-26 12:11 UTC

head link

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

"Craig Topper" <craig.topper at gmail.com> wrote:
>I just compiled the two attached files in 32-bit mode and ran it.
> 
> It printed efcdab8967452301.
> 
> I verified via objdump that the my_bswap function contains the follow
> assembly which I believe matches the assembly you linked to on godbolt.
> 
> _my_bswap:
>    1f70: 55 pushl %ebp
>    1f71: 89 e5 movl %esp, %ebp
>    1f73: 8b 55 08 movl 8(%ebp), %edx
>    1f76: 8b 45 0c movl 12(%ebp), %eax
>    1f79: 0f c8 bswapl %eax
>    1f7b: 0f ca bswapl %edx
>    1f7d: 5d popl %ebp
>    1f7e: c3 retl
Contrary to my initial WRONG claim this code is CORRECT: two 32-bit BSWAP
instructions on the swapped halves of a 64-bit register are equivalent to
a 64-bit BSWAP instruction.
Sorry for the confusion, I should have slept another night before responding.

The other part of my post but holds: the following function, compiled for
x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>):

unsigned long __bswapsi2 (unsigned long ul)
{
    return (ul >> 3 * 8) & 0xff000000ul
         | (ul >>     8) & 0x00ff0000ul
         | (ul <<     8) & 0x0000ff00ul
         | (ul << 3 * 8) & 0x000000fful;
}

The expected optimized code is

__bswapsi2: # @__bswapsi2
    bswap edi
    mov   eax, edi
    ret

regards
Stefan Kanthak

[ fullquote removed ]

Fabian Giesen via llvm-dev

2018-Nov-26 19:11 UTC

head link

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

On 11/26/2018 4:11 AM, Stefan Kanthak via llvm-dev
wrote:> The other part of my post but holds: the following function, compiled for
> x86-64, is NOT properly optimized (see
<https://godbolt.org/z/uM9nvN>):
> 
> unsigned long __bswapsi2 (unsigned long ul)
> {
>      return (ul >> 3 * 8) & 0xff000000ul
>           | (ul >>     8) & 0x00ff0000ul
>           | (ul <<     8) & 0x0000ff00ul
>           | (ul << 3 * 8) & 0x000000fful;
> }
That looks like another instance of truncation in the wrong places 
causing the pattern matches to fail. There have been a couple of those 
in the past with other things like rotates.

This is indeed a bug, but if you need a quick workaround, the pattern 
does work when you use a 32-bit (e.g. unsigned int or uint32_t) type.

-Fabian

llvm dev - Nov 2018 - BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

llvm dev - Nov 2018 - BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()

[llvm-dev] BUGS n code generated for target i386 compiling bswapdi3, and for target x86-64 compiling bswapsi2()