thr3ads.net - search: "xchg

[PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k

2023 Sep 11

0

[PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k

On 9/10/23 04:28, guoren at kernel.org wrote: > From: Guo Ren <guoren at linux.alibaba.com> > > The target of xchg_tail is to write the tail to the lock value, so > adding prefetchw could help the next cmpxchg step, which may > decrease the cmpxchg retry loops of xchg_tail. Some processors may > utilize this feature to give a forward guarantee, e.g., RISC-V > XuanTie processors would block the snoop chan...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 17

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doing gi...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 17

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doing gi...

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 15

0

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

From: Peter Zijlstra <peterz at infradead.org> When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change....

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 15

0

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

From: Waiman Long <Waiman.Long at hp.com> This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> Signed-off-by: Peter Zijlstra <peterz at infradead.org> --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 58 +++++++++++++++++++++--------...

[PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

2014 Apr 17

0

[PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word

This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 61 +++++++++++++++++++++------------ 2 files changed, 41 insertions(+), 22 deletions(-) diff...

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 18

1

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observ...

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 18

1

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observ...

[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014 Apr 17

0

[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change....

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 18

0

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

...17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto: > On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: >> From: Waiman Long <Waiman.Long at hp.com> >> >> This patch extracts the logic for the exchange of new and previous tail >> code words into a new xchg_tail() function which can be optimized in a >> later patch. > > And also adds a third try on acquiring the lock. That I think should > be a seperate patch. It doesn't really add a new try, the old code is: - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val &...

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jul 07

2

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

Il 07/07/2014 16:35, Peter Zijlstra ha scritto: > On Wed, Jun 18, 2014 at 01:39:52PM +0200, Paolo Bonzini wrote: >> Il 15/06/2014 14:47, Peter Zijlstra ha scritto: >>> >>> - for (;;) { >>> - new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; >>> - >>> - old = atomic_cmpxchg(&lock->val, val, new); >>> - if (old == val)

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jul 07

2

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

Il 07/07/2014 16:35, Peter Zijlstra ha scritto: > On Wed, Jun 18, 2014 at 01:39:52PM +0200, Paolo Bonzini wrote: >> Il 15/06/2014 14:47, Peter Zijlstra ha scritto: >>> >>> - for (;;) { >>> - new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; >>> - >>> - old = atomic_cmpxchg(&lock->val, val, new); >>> - if (old == val)

[PATCH 00/11] qspinlock with paravirt support

2014 Jun 15

28

[PATCH 00/11] qspinlock with paravirt support

Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be

[PATCH 00/11] qspinlock with paravirt support

2014 Jun 15

28

[PATCH 00/11] qspinlock with paravirt support

Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014 Apr 17

2

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

...+++++++++++++++++++++++++-- > 1 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index 55601b4..497da24 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -216,6 +216,7 @@ xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval) > static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > { > u32 old, new, val = *pval; > + int retry = 1; > > /* > * trylock || pending > @@ -225,11 +226,38 @@ static inline int trylock_pending(struct q...

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014 Apr 17

2

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

...+++++++++++++++++++++++++-- > 1 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index 55601b4..497da24 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -216,6 +216,7 @@ xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval) > static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > { > u32 old, new, val = *pval; > + int retry = 1; > > /* > * trylock || pending > @@ -225,11 +226,38 @@ static inline int trylock_pending(struct q...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 18

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

...nrad Rzeszutek Wilk ha scritto: > >On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > >>From: Waiman Long <Waiman.Long at hp.com> > >> > >>This patch extracts the logic for the exchange of new and previous tail > >>code words into a new xchg_tail() function which can be optimized in a > >>later patch. > > > >And also adds a third try on acquiring the lock. That I think should > >be a seperate patch. > > It doesn't really add a new try, the old code is: > > > - for (;;) { > - new = _Q_LOCK...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 18

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

...nrad Rzeszutek Wilk ha scritto: > >On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > >>From: Waiman Long <Waiman.Long at hp.com> > >> > >>This patch extracts the logic for the exchange of new and previous tail > >>code words into a new xchg_tail() function which can be optimized in a > >>later patch. > > > >And also adds a third try on acquiring the lock. That I think should > >be a seperate patch. > > It doesn't really add a new try, the old code is: > > > - for (;;) { > - new = _Q_LOCK...

[RFC 08/07] qspinlock: integrate pending bit into queue

2014 May 21

0

[RFC 08/07] qspinlock: integrate pending bit into queue

...ine void set_pending(struct qspinlock *lock, u8 pending) +{ + struct __qspinlock *l = (void *)lock; + + // take a look if this is necessary, and if we don't have an + // abstraction already + barrier(); + ACCESS_ONCE(l->pending) = pending; + barrier(); +} + +// and here +static inline u32 cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 newtail) +// API-incompatible with set_pending and the shifting is ugly, so I'd rather +// refactor this one, xchg_tail() and encode_tail() ... another day +{ + struct __qspinlock *l = (void *)lock; + + return (u32)cmpxchg(&l->tail, tail >> _Q_...

[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support

2014 May 30

19

[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support

v10->v11: - Use a simple test-and-set unfair lock to simplify the code, but performance may suffer a bit for large guest with many CPUs. - Take out Raghavendra KT's test results as the unfair lock changes may render some of his results invalid. - Add PV support without increasing the size of the core queue node structure. - Other minor changes to address some of the

search for: xchg_tail