Displaying 20 results from an estimated 2000 matches similar to: "BUGS in code generated for target i386-win32"
2018 Nov 26
2
BUGS in code generated for target i386-win32
"Tim Northover" <t.p.northover at gmail.com> wrote:
> Hi Stefan,
>
> On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> LLVM/clang generates wrong code for the following program
>> (see <https://godbolt.org/z/UZrrkG>):
>
> It looks like all of these issues come down to mismatched
2018 Nov 06
4
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
Hi @ll,
while clang/LLVM recognizes common bit-twiddling idioms/expressions
like
unsigned int rotate(unsigned int x, unsigned int n)
{
return (x << n) | (x >> (32 - n));
}
and typically generates "rotate" machine instructions for this
expression, it fails to recognize other also common bit-twiddling
idioms/expressions.
The standard IEEE CRC-32 for "big
2018 Nov 27
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
"Sanjay Patel" <spatel at rotateright.com> wrote:
> IIUC, you want to use x86-specific bit-hacks (sbb masking) in cases like
> this:
> unsigned int foo(unsigned int crc) {
> if (crc & 0x80000000)
> crc <<= 1, crc ^= 0xEDB88320;
> else
> crc <<= 1;
> return crc;
> }
To document this for x86 too: rewrite the function
2018 Dec 01
2
Where's the optimiser gone? (part 5.c): missed tail calls, and more...
Compile the following functions with "-O3 -target i386-win32"
(see <https://godbolt.org/z/exmjWY>):
__int64 __fastcall div(__int64 foo, __int64 bar)
{
return foo / bar;
}
On the left the generated code; on the right the expected,
properly optimised code:
push dword ptr [esp + 16] |
push dword ptr [esp + 16] |
push dword ptr [esp + 16] |
2015 Apr 09
6
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -0,0 +1,321 @@
> +#ifndef _GEN_PV_LOCK_SLOWPATH
> +#error "do not include this file"
> +#endif
> +
> +/*
> + * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
> + * of spinning them.
> + *
> + * This relies on the
2015 Apr 09
6
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -0,0 +1,321 @@
> +#ifndef _GEN_PV_LOCK_SLOWPATH
> +#error "do not include this file"
> +#endif
> +
> +/*
> + * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
> + * of spinning them.
> + *
> + * This relies on the
2015 Mar 19
4
[PATCH 8/9] qspinlock: Generic paravirt support
On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:
> So I was now thinking of hashing the lock pointer; let me go and quickly
> put something together.
A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda bloated, but it shows the idea I think.
And while this has loops in (the rehashing thing) their fwd progress
does not depend on other
2015 Mar 19
4
[PATCH 8/9] qspinlock: Generic paravirt support
On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:
> So I was now thinking of hashing the lock pointer; let me go and quickly
> put something together.
A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda bloated, but it shows the idea I think.
And while this has loops in (the rehashing thing) their fwd progress
does not depend on other
2015 Apr 13
1
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
> >>+void __init __pv_init_lock_hash(void)
> >>+{
> >>+ int pv_hash_size = 4 * num_possible_cpus();
> >>+
> >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS))
> >>+ pv_hash_size = (1U<< LFSR_MIN_BITS);
> >>+ /*
> >>+ * Allocate space from bootmem which
2015 Apr 13
1
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
> >>+void __init __pv_init_lock_hash(void)
> >>+{
> >>+ int pv_hash_size = 4 * num_possible_cpus();
> >>+
> >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS))
> >>+ pv_hash_size = (1U<< LFSR_MIN_BITS);
> >>+ /*
> >>+ * Allocate space from bootmem which
2018 Nov 28
2
Rather poor code optimisation of current clang/LLVM targeting Intel x86 (both -64 and -32)
On Wed, Nov 28, 2018 at 7:11 AM Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks for reporting this and other perf opportunities. As I mentioned
> before, if you could file bug reports for these, that's probably the only
> way they're ever going to get fixed (unless you're planning to fix them
> yourself). It's not an ideal situation, but
2018 Nov 25
3
BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
Hi @ll,
targetting i386, LLVM/clang generates wrong code for the following
functions:
unsigned long __bswapsi2 (unsigned long ul)
{
return (((ul) & 0xff000000ul) >> 3 * 8)
| (((ul) & 0x00ff0000ul) >> 8)
| (((ul) & 0x0000ff00ul) << 8)
| (((ul) & 0x000000fful) << 3 * 8);
}
unsigned long long __bswapdi2(unsigned long
2018 Nov 30
2
(Question regarding the) incomplete "builtins library" of "Compiler-RT"
"Friedman, Eli" <efriedma at codeaurora.org> wrote:
> On 11/30/2018 8:31 AM, Stefan Kanthak via llvm-dev wrote:
>> Hi @ll,
>>
>> compiler-rt implements (for example) the MSVC (really Windows)
>> specific routines compiler-rt/lib/builtins/i386/chkstk.S and
>> compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms()
>> See
2018 Nov 25
2
BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
I just compiled the two attached files in 32-bit mode and ran it.
It printed efcdab8967452301.
I verified via objdump that the my_bswap function contains the follow
assembly which I believe matches the assembly you linked to on godbolt.
_my_bswap:
1f70: 55 pushl %ebp
1f71: 89 e5 movl %esp, %ebp
1f73: 8b 55 08 movl 8(%ebp), %edx
1f76: 8b 45 0c movl 12(%ebp), %eax
1f79: 0f c8
2018 Nov 25
3
BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()
bswapdi2 for i386 is correct
Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
eax. Those are each bswapped. The ABI for the return is edx contains bits
[63:32] and eax contains [31:0]. This is opposite of how the register were
loaded.
~Craig
On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com>
wrote:
> bswapsi2 on the x86-64 isn't using
2015 Apr 09
0
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On 04/09/2015 02:13 PM, Peter Zijlstra wrote:
> On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -0,0 +1,321 @@
>> +#ifndef _GEN_PV_LOCK_SLOWPATH
>> +#error "do not include this file"
>> +#endif
>> +
>> +/*
>> + * Implement paravirt qspinlocks; the general idea is to halt the
2015 Apr 09
0
[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
> > +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket))
> > +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
> > +{
> > + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits);
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15:
- Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
qspinlock code by dynamically allocating the hash table as well
as some other performance optimization.
- Simplified the Xen PV qspinlock code as suggested by David Vrabel
<david.vrabel at citrix.com>.
- Add benchmarking data for 3.19 kernel to compare the performance
of a spinlock heavy test
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15:
- Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
qspinlock code by dynamically allocating the hash table as well
as some other performance optimization.
- Simplified the Xen PV qspinlock code as suggested by David Vrabel
<david.vrabel at citrix.com>.
- Add benchmarking data for 3.19 kernel to compare the performance
of a spinlock heavy test
2018 Nov 30
3
(Question regarding the) incomplete "builtins library" of "Compiler-RT"
Hi @ll,
compiler-rt implements (for example) the MSVC (really Windows)
specific routines compiler-rt/lib/builtins/i386/chkstk.S and
compiler-rt/lib/builtins/x86_64/chkstk.S as __chkstk_ms()
See <http://msdn.microsoft.com/en-us/library/ms648426.aspx>
Is there any special reason why compiler-rt doesn't implement
other MSVC specific functions (alias builtins or "compiler