thr3ads.net - search: "mcs

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

5

[PATCH 03/11] qspinlock: Add pending bit

...Zijlstra wrote: > Because the qspinlock needs to touch a second cacheline; add a pending > bit and allow a single in-word spinner before we punt to the second > cacheline. Could you add this in the description please: And by second cacheline we mean the local 'node'. That is the: mcs_nodes[0] and mcs_nodes[idx] Perhaps it might be better then to split this in the header file as this is trying to not be a slowpath code - but rather - a pre-slow-path-lets-try-if-we can do another cmpxchg in case the unlocker has just unlocked itself. So something like: diff --git a/include/asm-gener...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

5

[PATCH 03/11] qspinlock: Add pending bit

...Zijlstra wrote: > Because the qspinlock needs to touch a second cacheline; add a pending > bit and allow a single in-word spinner before we punt to the second > cacheline. Could you add this in the description please: And by second cacheline we mean the local 'node'. That is the: mcs_nodes[0] and mcs_nodes[idx] Perhaps it might be better then to split this in the header file as this is trying to not be a slowpath code - but rather - a pre-slow-path-lets-try-if-we can do another cmpxchg in case the unlocker has just unlocked itself. So something like: diff --git a/include/asm-gener...

[PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014 May 07

0

[PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

...have to be defined and used here. + */ +struct qnode { + struct mcs_spinlock mcs; +}; + +/* * Per-CPU queue node structures; we can never have more than 4 nested * contexts: task, softirq, hardirq, nmi. * * Exactly fits one cacheline. */ -static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); +static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[4]); /* * We must be able to distinguish between no-tail and the tail at 0:0, @@ -79,12 +88,12 @@ static inline u32 encode_tail(int cpu, int idx) return tail; } -static inline struct mcs_spinlock *decode_tail(u32 tail) +static inlin...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

3

[PATCH 03/11] qspinlock: Add pending bit

...touch a second cacheline; add a pending > >>bit and allow a single in-word spinner before we punt to the second > >>cacheline. > >Could you add this in the description please: > > > >And by second cacheline we mean the local 'node'. That is the: > >mcs_nodes[0] and mcs_nodes[idx] > > > >Perhaps it might be better then to split this in the header file > >as this is trying to not be a slowpath code - but rather - a > >pre-slow-path-lets-try-if-we can do another cmpxchg in case > >the unlocker has just unlocked itself. > &...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

3

[PATCH 03/11] qspinlock: Add pending bit

...touch a second cacheline; add a pending > >>bit and allow a single in-word spinner before we punt to the second > >>cacheline. > >Could you add this in the description please: > > > >And by second cacheline we mean the local 'node'. That is the: > >mcs_nodes[0] and mcs_nodes[idx] > > > >Perhaps it might be better then to split this in the header file > >as this is trying to not be a slowpath code - but rather - a > >pre-slow-path-lets-try-if-we can do another cmpxchg in case > >the unlocker has just unlocked itself. > &...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 16

0

[PATCH 8/9] qspinlock: Generic paravirt support

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage available and reverts to the per-cpu lock entry similar to the current kvm code. We can do a single enrty because any nesting will wake the vcpu and cause the lower loop to retry. Signed-off...

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 16

0

[PATCH 8/9] qspinlock: Generic paravirt support

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage available and reverts to the per-cpu lock entry similar to the current kvm code. We can do a single enrty because any nesting will wake the vcpu and cause the lower loop to retry. Signed-off...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...lot when the thing is contended. Below is the (rather messy) qspinlock slow path code (the only thing that really differs between our versions. I'll try and slot your version in tomorrow. --- /* * Exactly fills one cacheline on 64bit. */ static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); static inline u32 encode_tail(int cpu, int idx) { u32 code; code = (cpu + 1) << _Q_TAIL_CPU_OFFSET; code |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ return code; } static inline struct mcs_spinlock *decode_tail(u32 code) { int cpu = (code >> _Q_TAIL_CPU...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...lot when the thing is contended. Below is the (rather messy) qspinlock slow path code (the only thing that really differs between our versions. I'll try and slot your version in tomorrow. --- /* * Exactly fills one cacheline on 64bit. */ static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); static inline u32 encode_tail(int cpu, int idx) { u32 code; code = (cpu + 1) << _Q_TAIL_CPU_OFFSET; code |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */ return code; } static inline struct mcs_spinlock *decode_tail(u32 code) { int cpu = (code >> _Q_TAIL_CPU...

[PATCH 3/9] qspinlock: Add pending bit

2015 Mar 16

0

[PATCH 3/9] qspinlock: Add pending bit

From: Peter Zijlstra <peterz at infradead.org> Because the qspinlock needs to touch a second cacheline (the per-cpu mcs_nodes[]); add a pending bit and allow a single in-word spinner before we punt to the second cacheline. It is possible so observe the pending bit without the locked bit when the last owner has just released but the pending owner has not yet taken ownership. In this case we would normally queue -- becaus...

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014 Jun 16

4

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

...; + > +#include "mcs_spinlock.h" > + > +/* > + * Per-CPU queue node structures; we can never have more than 4 nested > + * contexts: task, softirq, hardirq, nmi. > + * > + * Exactly fits one cacheline. > + */ > +static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); > + > +/* > + * We must be able to distinguish between no-tail and the tail at 0:0, > + * therefore increment the cpu number by one. > + */ > + > +static inline u32 encode_tail(int cpu, int idx) > +{ > + u32 tail; > + > + tail = (cpu + 1) << _Q_TAIL_CPU_...

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014 Jun 16

4

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

...; + > +#include "mcs_spinlock.h" > + > +/* > + * Per-CPU queue node structures; we can never have more than 4 nested > + * contexts: task, softirq, hardirq, nmi. > + * > + * Exactly fits one cacheline. > + */ > +static DEFINE_PER_CPU_ALIGNED(struct mcs_spinlock, mcs_nodes[4]); > + > +/* > + * We must be able to distinguish between no-tail and the tail at 0:0, > + * therefore increment the cpu number by one. > + */ > + > +static inline u32 encode_tail(int cpu, int idx) > +{ > + u32 tail; > + > + tail = (cpu + 1) << _Q_TAIL_CPU_...

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 07

0

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Signed-off-by: Waiman Long <Waim...

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 May 04

1

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Cc: Raghavendra K T <raghavendra...

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 May 04

1

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Cc: Raghavendra K T <raghavendra...

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

2015 Apr 24

0

[PATCH v16 08/14] pvqspinlock: Implement simple paravirt support for the qspinlock

...ath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv version. The actual MCS queue can use extra storage in the mcs_nodes[] array to keep track of state and therefore uses directed wakeups. The head contender has no such storage directly visible to the unlocker. So the unlocker searches a hash table with open addressing using a simple binary Galois linear feedback shift register. Signed-off-by: Waiman Long <Waim...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

0

[PATCH 03/11] qspinlock: Add pending bit

...se the qspinlock needs to touch a second cacheline; add a pending >> bit and allow a single in-word spinner before we punt to the second >> cacheline. > Could you add this in the description please: > > And by second cacheline we mean the local 'node'. That is the: > mcs_nodes[0] and mcs_nodes[idx] > > Perhaps it might be better then to split this in the header file > as this is trying to not be a slowpath code - but rather - a > pre-slow-path-lets-try-if-we can do another cmpxchg in case > the unlocker has just unlocked itself. > > So something like...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

0

[PATCH 03/11] qspinlock: Add pending bit

...a pending > > >>bit and allow a single in-word spinner before we punt to the second > > >>cacheline. > > >Could you add this in the description please: > > > > > >And by second cacheline we mean the local 'node'. That is the: > > >mcs_nodes[0] and mcs_nodes[idx] > > > > > >Perhaps it might be better then to split this in the header file > > >as this is trying to not be a slowpath code - but rather - a > > >pre-slow-path-lets-try-if-we can do another cmpxchg in case > > >the unlocker has jus...

search for: mcs_nodes