thr3ads.net - similar to: "BUGS in code generated for target i386-win32"

BUGS in code generated for target i386-win32

2018 Nov 26

2

BUGS in code generated for target i386-win32

"Tim Northover" <t.p.northover at gmail.com> wrote: > Hi Stefan, > > On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> LLVM/clang generates wrong code for the following program >> (see <https://godbolt.org/z/UZrrkG>): > > It looks like all of these issues come down to mismatched

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 06

4

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

Hi @ll, while clang/LLVM recognizes common bit-twiddling idioms/expressions like unsigned int rotate(unsigned int x, unsigned int n) { return (x << n) | (x >> (32 - n)); } and typically generates "rotate" machine instructions for this expression, it fails to recognize other also common bit-twiddling idioms/expressions. The standard IEEE CRC-32 for "big

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 27

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

"Sanjay Patel" <spatel at rotateright.com> wrote: > IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like > this: > unsigned int foo(unsigned int crc) { > if (crc & 0x80000000) > crc <<= 1, crc ^= 0xEDB88320; > else > crc <<= 1; > return crc; > } To document this for x86 too: rewrite the function

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

2018 Dec 01

2

Where's the optimiser gone? (part 5.c): missed tail calls, and more...

Compile the following functions with "-O3 -target i386-win32" (see <https://godbolt.org/z/exmjWY>): __int64 __fastcall div(__int64 foo, __int64 bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: push dword ptr [esp + 16] | push dword ptr [esp + 16] | push dword ptr [esp + 16] |

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

6

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: > +++ b/kernel/locking/qspinlock_paravirt.h > @@ -0,0 +1,321 @@ > +#ifndef _GEN_PV_LOCK_SLOWPATH > +#error "do not include this file" > +#endif > + > +/* > + * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead > + * of spinning them. > + * > + * This relies on the

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

6

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: > +++ b/kernel/locking/qspinlock_paravirt.h > @@ -0,0 +1,321 @@ > +#ifndef _GEN_PV_LOCK_SLOWPATH > +#error "do not include this file" > +#endif > + > +/* > + * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead > + * of spinning them. > + * > + * This relies on the

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

4

[PATCH 8/9] qspinlock: Generic paravirt support

On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote: > So I was now thinking of hashing the lock pointer; let me go and quickly > put something together. A little something like so; ideally we'd allocate the hashtable since NR_CPUS is kinda bloated, but it shows the idea I think. And while this has loops in (the rehashing thing) their fwd progress does not depend on other

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

4

[PATCH 8/9] qspinlock: Generic paravirt support

On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote: > So I was now thinking of hashing the lock pointer; let me go and quickly > put something together. A little something like so; ideally we'd allocate the hashtable since NR_CPUS is kinda bloated, but it shows the idea I think. And while this has loops in (the rehashing thing) their fwd progress does not depend on other

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 13

1

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: > >>+void __init __pv_init_lock_hash(void) > >>+{ > >>+ int pv_hash_size = 4 * num_possible_cpus(); > >>+ > >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS)) > >>+ pv_hash_size = (1U<< LFSR_MIN_BITS); > >>+ /* > >>+ * Allocate space from bootmem which

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 13

1

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: > >>+void __init __pv_init_lock_hash(void) > >>+{ > >>+ int pv_hash_size = 4 * num_possible_cpus(); > >>+ > >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS)) > >>+ pv_hash_size = (1U<< LFSR_MIN_BITS); > >>+ /* > >>+ * Allocate space from bootmem which

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

2018 Nov 28

2

Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)

On Wed, Nov 28, 2018 at 7:11 AM Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Thanks for reporting this and other perf opportunities. As I mentioned > before, if you could file bug reports for these, that's probably the only > way they're ever going to get fixed (unless you're planning to fix them > yourself). It's not an ideal situation, but

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

2018 Nov 25

3

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Hi @ll, targetting i386, LLVM/clang generates wrong code for the following functions: unsigned long __bswapsi2 (unsigned long ul) { return (((ul) & 0xff000000ul) >> 3 * 8) | (((ul) & 0x00ff0000ul) >> 8) | (((ul) & 0x0000ff00ul) << 8) | (((ul) & 0x000000fful) << 3 * 8); } unsigned long long __bswapdi2(unsigned long

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

2018 Nov 30

2

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

"Friedman, Eli" <efriedma at codeaurora.org> wrote: > On 11/30/2018 8:31 AM, Stefan Kanthak via llvm-dev wrote: >> Hi @ll, >> >> compiler-rt implements (for example) the MSVC (really Windows) >> specific routines compiler-rt/lib/builtins/i386/chkstk.S and >> compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() >> See

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

2018 Nov 25

2

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

I just compiled the two attached files in 32-bit mode and ran it. It printed efcdab8967452301. I verified via objdump that the my_bswap function contains the follow assembly which I believe matches the assembly you linked to on godbolt. _my_bswap: 1f70: 55 pushl %ebp 1f71: 89 e5 movl %esp, %ebp 1f73: 8b 55 08 movl 8(%ebp), %edx 1f76: 8b 45 0c movl 12(%ebp), %eax 1f79: 0f c8

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

2018 Nov 25

3

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

bswapdi2 for i386 is correct Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into eax. Those are each bswapped. The ABI for the return is edx contains bits [63:32] and eax contains [31:0]. This is opposite of how the register were loaded. ~Craig On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> wrote: > bswapsi2 on the x86-64 isn't using

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On 04/09/2015 02:13 PM, Peter Zijlstra wrote: > On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: >> +++ b/kernel/locking/qspinlock_paravirt.h >> @@ -0,0 +1,321 @@ >> +#ifndef _GEN_PV_LOCK_SLOWPATH >> +#error "do not include this file" >> +#endif >> + >> +/* >> + * Implement paravirt qspinlocks; the general idea is to halt the

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote: > On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: > > +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket)) > > +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) > > +{ > > + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

2018 Nov 30

3

(Question regarding the) incomplete "builtins library" of "Compiler-RT"

Hi @ll, compiler-rt implements (for example) the MSVC (really Windows) specific routines compiler-rt/lib/builtins/i386/chkstk.S and compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms() See <http://msdn.microsoft.com/en-us/library/ms648426.aspx> Is there any special reason why compiler-rt doesn't implement other MSVC specific functions (alias builtins or "compiler

similar to: BUGS in code generated for target i386-win32