thr3ads.net - search: "lfsr"

BUGS in code generated for target i386-win32

2018 Nov 26

3

BUGS in code generated for target i386-win32

Hi @ll, LLVM/clang generates wrong code for the following program (see <https://godbolt.org/z/UZrrkG>): --- sample.c --- unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial) { __asm { add ecx, ecx ; ecx = argument << 1 sbb eax, eax ; eax = CF ? -1 : 0 and eax, edx ; eax = CF ? polynomial : 0 xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0) } } int main() {...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

6

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...; + > +enum vcpu_state { > + vcpu_running = 0, > + vcpu_halted, > +}; > + > +struct pv_node { > + struct mcs_spinlock mcs; > + struct mcs_spinlock __res[3]; > + > + int cpu; > + u8 state; > +}; > + > +/* > + * Hash table using open addressing with an LFSR probe sequence. > + * > + * Since we should not be holding locks from NMI context (very rare indeed) the > + * max load factor is 0.75, which is around the point where open addressing > + * breaks down. > + * > + * Instead of probing just the immediate bucket we probe all buckets...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

6

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...; + > +enum vcpu_state { > + vcpu_running = 0, > + vcpu_halted, > +}; > + > +struct pv_node { > + struct mcs_spinlock mcs; > + struct mcs_spinlock __res[3]; > + > + int cpu; > + u8 state; > +}; > + > +/* > + * Hash table using open addressing with an LFSR probe sequence. > + * > + * Since we should not be holding locks from NMI context (very rare indeed) the > + * max load factor is 0.75, which is around the point where open addressing > + * breaks down. > + * > + * Instead of probing just the immediate bucket we probe all buckets...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

4

[PATCH 8/9] qspinlock: Generic paravirt support

...NR_CPUS is kinda bloated, but it shows the idea I think. And while this has loops in (the rehashing thing) their fwd progress does not depend on other CPUs. And I suspect that for the typical lock contention scenarios its unlikely we ever really get into long rehashing chains. --- include/linux/lfsr.h | 49 ++++++++++++ kernel/locking/qspinlock_paravirt.h | 143 ++++++++++++++++++++++++++++++++---- 2 files changed, 178 insertions(+), 14 deletions(-) --- /dev/null +++ b/include/linux/lfsr.h @@ -0,0 +1,49 @@ +#ifndef _LINUX_LFSR_H +#define _LINUX_LFSR_H + +/* + * Simple Binary...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

4

[PATCH 8/9] qspinlock: Generic paravirt support

...NR_CPUS is kinda bloated, but it shows the idea I think. And while this has loops in (the rehashing thing) their fwd progress does not depend on other CPUs. And I suspect that for the typical lock contention scenarios its unlikely we ever really get into long rehashing chains. --- include/linux/lfsr.h | 49 ++++++++++++ kernel/locking/qspinlock_paravirt.h | 143 ++++++++++++++++++++++++++++++++---- 2 files changed, 178 insertions(+), 14 deletions(-) --- /dev/null +++ b/include/linux/lfsr.h @@ -0,0 +1,49 @@ +#ifndef _LINUX_LFSR_H +#define _LINUX_LFSR_H + +/* + * Simple Binary...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...>> + vcpu_halted, >> +}; >> + >> +struct pv_node { >> + struct mcs_spinlock mcs; >> + struct mcs_spinlock __res[3]; >> + >> + int cpu; >> + u8 state; >> +}; >> + >> +/* >> + * Hash table using open addressing with an LFSR probe sequence. >> + * >> + * Since we should not be holding locks from NMI context (very rare indeed) the >> + * max load factor is 0.75, which is around the point where open addressing >> + * breaks down. >> + * >> + * Instead of probing just the immediate buck...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 09

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...; + /* > > + * We haven't set the _Q_SLOW_VAL yet. So > > + * the order of writing doesn't matter. > > + */ > > + smp_wmb(); /* matches rmb from pv_hash_find */ > > + goto done; > > + } > > + } > > + > > + hash = lfsr(hash, pv_lock_hash_bits, 0); > > Since pv_lock_hash_bits is a variable, you end up running through that > massive if() forest to find the corresponding tap every single time. It > cannot compile-time optimize it. > > Hence: > hash = lfsr(hash, pv_taps); > > (I don...

BUGS in code generated for target i386-win32

2018 Nov 26

2

BUGS in code generated for target i386-win32

...inline/using-and-preserving-registers-in-inline-assembly?view=vs-2017 Trust me: I KNOW THIS DOCUMENTATION! > I'll try to explain a little below how that one mismatch causes the > issues you're seeing. > >> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"! >> BUG #2: the variable "lfsr" is NOT initialized! > > Since the __asm isn't linked (as far as Clang is concerned) to > either input for lfsr32, they're both unused. REALLY? Or better: OUCH! Is Clang NOT aware of the __fastcall calling convention and its re...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 13

1

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: > >>+void __init __pv_init_lock_hash(void) > >>+{ > >>+ int pv_hash_size = 4 * num_possible_cpus(); > >>+ > >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS)) > >>+ pv_hash_size = (1U<< LFSR_MIN_BITS); > >>+ /* > >>+ * Allocate space from bootmem which should be page-size aligned > >>+ * and hence cacheline aligned. > >>+ */ > >>+ pv_lock_hash = alloc_large_system_hash("PV q...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 13

1

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote: > >>+void __init __pv_init_lock_hash(void) > >>+{ > >>+ int pv_hash_size = 4 * num_possible_cpus(); > >>+ > >>+ if (pv_hash_size< (1U<< LFSR_MIN_BITS)) > >>+ pv_hash_size = (1U<< LFSR_MIN_BITS); > >>+ /* > >>+ * Allocate space from bootmem which should be page-size aligned > >>+ * and hence cacheline aligned. > >>+ */ > >>+ pv_lock_hash = alloc_large_system_hash("PV q...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 18

2

[PATCH 8/9] qspinlock: Generic paravirt support

On 03/16/2015 09:16 AM, Peter Zijlstra wrote: > Implement simple paravirt support for the qspinlock. > > Provide a separate (second) version of the spin_lock_slowpath for > paravirt along with a special unlock path. > > The second slowpath is generated by adding a few pv hooks to the > normal slowpath, but where those will compile away for the native > case, they expand

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 18

2

[PATCH 8/9] qspinlock: Generic paravirt support

On 03/16/2015 09:16 AM, Peter Zijlstra wrote: > Implement simple paravirt support for the qspinlock. > > Provide a separate (second) version of the spin_lock_slowpath for > paravirt along with a special unlock path. > > The second slowpath is generated by adding a few pv hooks to the > normal slowpath, but where those will compile away for the native > case, they expand

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

...pervisors pvqspinlock: Implement the paravirt qspinlock for x86 Waiman Long (11): qspinlock: A simple generic 4-byte queue spinlock qspinlock, x86: Enable x86-64 to use queue spinlock qspinlock: Extract out code snippets for the next patch qspinlock: Use a simple write to grab the lock lfsr: a simple binary Galois linear feedback shift register pvqspinlock: Implement simple paravirt support for the qspinlock pvqspinlock, x86: Enable PV qspinlock for KVM pvqspinlock, x86: Enable PV qspinlock for Xen pvqspinlock: Only kick CPU at unlock time pvqspinlock: Improve slowpath perfo...

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

...pervisors pvqspinlock: Implement the paravirt qspinlock for x86 Waiman Long (11): qspinlock: A simple generic 4-byte queue spinlock qspinlock, x86: Enable x86-64 to use queue spinlock qspinlock: Extract out code snippets for the next patch qspinlock: Use a simple write to grab the lock lfsr: a simple binary Galois linear feedback shift register pvqspinlock: Implement simple paravirt support for the qspinlock pvqspinlock, x86: Enable PV qspinlock for KVM pvqspinlock, x86: Enable PV qspinlock for Xen pvqspinlock: Only kick CPU at unlock time pvqspinlock: Improve slowpath perfo...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 07

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...ue_spin_unlock(). + */ + +#define _Q_SLOW_VAL (3U << _Q_LOCKED_OFFSET) + +enum vcpu_state { + vcpu_running = 0, + vcpu_halted, +}; + +struct pv_node { + struct mcs_spinlock mcs; + struct mcs_spinlock __res[3]; + + int cpu; + u8 state; +}; + +/* + * Hash table using open addressing with an LFSR probe sequence. + * + * Since we should not be holding locks from NMI context (very rare indeed) the + * max load factor is 0.75, which is around the point where open addressing + * breaks down. + * + * Instead of probing just the immediate bucket we probe all buckets in the + * same cacheline. + *...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Apr 01

0

[PATCH 8/9] qspinlock: Generic paravirt support

...idea I think. > > And while this has loops in (the rehashing thing) their fwd progress > does not depend on other CPUs. > > And I suspect that for the typical lock contention scenarios its > unlikely we ever really get into long rehashing chains. > > --- > include/linux/lfsr.h | 49 ++++++++++++ > kernel/locking/qspinlock_paravirt.h | 143 ++++++++++++++++++++++++++++++++---- > 2 files changed, 178 insertions(+), 14 deletions(-) > > --- /dev/null > > + > +static int pv_hash_find(struct qspinlock *lock) > +{ > + u64 hash =...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Apr 02

3

[PATCH 8/9] qspinlock: Generic paravirt support

On Thu, Apr 02, 2015 at 12:28:30PM -0400, Waiman Long wrote: > On 04/01/2015 05:03 PM, Peter Zijlstra wrote: > >On Wed, Apr 01, 2015 at 03:58:58PM -0400, Waiman Long wrote: > >>On 04/01/2015 02:48 PM, Peter Zijlstra wrote: > >>I am sorry that I don't quite get what you mean here. My point is that in > >>the hashing step, a cpu will need to scan an empty

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Apr 02

3

[PATCH 8/9] qspinlock: Generic paravirt support

On Thu, Apr 02, 2015 at 12:28:30PM -0400, Waiman Long wrote: > On 04/01/2015 05:03 PM, Peter Zijlstra wrote: > >On Wed, Apr 01, 2015 at 03:58:58PM -0400, Waiman Long wrote: > >>On 04/01/2015 02:48 PM, Peter Zijlstra wrote: > >>I am sorry that I don't quite get what you mean here. My point is that in > >>the hashing step, a cpu will need to scan an empty

questions from a 10GbE driver author

2008 May 07

7

questions from a 10GbE driver author

Hi, I maintain a driver for a 10GbE nic which supports multiple hardware tx/rx rings. We can steer rx packets into rings using the "standard" NDIS6 Toeplitz hashing on TCP port numbers, IP addresses, etc. We can also steer packets based on MAC address. Would this NIC be considered to be capable of supporting crossbow? Also, can crossbow do things like steer outgoing packets to the

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Apr 02

0

[PATCH 8/9] qspinlock: Generic paravirt support

...p;l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL); > > VS > > __pv_queue_spin_unlock(): > > if (xchg(&l->locked, 0) != _Q_SLOW_VAL) > return; > > /* MB as per xchg */ > pv_hash_find(lock); > > Something like so.. compile tested only. I took out the LFSR because that was likely over engineering from my side :-) --- a/kernel/locking/qspinlock_paravirt.h +++ b/kernel/locking/qspinlock_paravirt.h @@ -2,6 +2,8 @@ #error "do not include this file" #endif +#include <linux/hash.h> + /* * Implement paravirt qspinlocks; the general i...

search for: lfsr