search for: xchg_tail

Displaying 20 results from an estimated 51 matches for "xchg_tail".

2023 Sep 11
0
[PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
On 9/10/23 04:28, guoren at kernel.org wrote: > From: Guo Ren <guoren at linux.alibaba.com> > > The target of xchg_tail is to write the tail to the lock value, so > adding prefetchw could help the next cmpxchg step, which may > decrease the cmpxchg retry loops of xchg_tail. Some processors may > utilize this feature to give a forward guarantee, e.g., RISC-V > XuanTie processors would block the snoop chan...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doing gi...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doing gi...
2014 Jun 15
0
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
From: Peter Zijlstra <peterz at infradead.org> When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change....
2014 Jun 15
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
From: Waiman Long <Waiman.Long at hp.com> This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> Signed-off-by: Peter Zijlstra <peterz at infradead.org> --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 58 +++++++++++++++++++++--------...
2014 Apr 17
0
[PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word
This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 61 +++++++++++++++++++++------------ 2 files changed, 41 insertions(+), 22 deletions(-) diff...
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observ...
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observ...
2014 Apr 17
0
[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS
When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change....
2014 Jun 18
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
...17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto: > On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: >> From: Waiman Long <Waiman.Long at hp.com> >> >> This patch extracts the logic for the exchange of new and previous tail >> code words into a new xchg_tail() function which can be optimized in a >> later patch. > > And also adds a third try on acquiring the lock. That I think should > be a seperate patch. It doesn't really add a new try, the old code is: - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val &...
2014 Jul 07
2
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
Il 07/07/2014 16:35, Peter Zijlstra ha scritto: > On Wed, Jun 18, 2014 at 01:39:52PM +0200, Paolo Bonzini wrote: >> Il 15/06/2014 14:47, Peter Zijlstra ha scritto: >>> >>> - for (;;) { >>> - new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; >>> - >>> - old = atomic_cmpxchg(&lock->val, val, new); >>> - if (old == val)
2014 Jul 07
2
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
Il 07/07/2014 16:35, Peter Zijlstra ha scritto: > On Wed, Jun 18, 2014 at 01:39:52PM +0200, Paolo Bonzini wrote: >> Il 15/06/2014 14:47, Peter Zijlstra ha scritto: >>> >>> - for (;;) { >>> - new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; >>> - >>> - old = atomic_cmpxchg(&lock->val, val, new); >>> - if (old == val)
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be
2014 Apr 17
2
[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path
...+++++++++++++++++++++++++-- > 1 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index 55601b4..497da24 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -216,6 +216,7 @@ xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval) > static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > { > u32 old, new, val = *pval; > + int retry = 1; > > /* > * trylock || pending > @@ -225,11 +226,38 @@ static inline int trylock_pending(struct q...
2014 Apr 17
2
[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path
...+++++++++++++++++++++++++-- > 1 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index 55601b4..497da24 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -216,6 +216,7 @@ xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval) > static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > { > u32 old, new, val = *pval; > + int retry = 1; > > /* > * trylock || pending > @@ -225,11 +226,38 @@ static inline int trylock_pending(struct q...
2014 Jun 18
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
...nrad Rzeszutek Wilk ha scritto: > >On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > >>From: Waiman Long <Waiman.Long at hp.com> > >> > >>This patch extracts the logic for the exchange of new and previous tail > >>code words into a new xchg_tail() function which can be optimized in a > >>later patch. > > > >And also adds a third try on acquiring the lock. That I think should > >be a seperate patch. > > It doesn't really add a new try, the old code is: > > > - for (;;) { > - new = _Q_LOCK...
2014 Jun 18
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
...nrad Rzeszutek Wilk ha scritto: > >On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > >>From: Waiman Long <Waiman.Long at hp.com> > >> > >>This patch extracts the logic for the exchange of new and previous tail > >>code words into a new xchg_tail() function which can be optimized in a > >>later patch. > > > >And also adds a third try on acquiring the lock. That I think should > >be a seperate patch. > > It doesn't really add a new try, the old code is: > > > - for (;;) { > - new = _Q_LOCK...
2014 May 21
0
[RFC 08/07] qspinlock: integrate pending bit into queue
...ine void set_pending(struct qspinlock *lock, u8 pending) +{ + struct __qspinlock *l = (void *)lock; + + // take a look if this is necessary, and if we don't have an + // abstraction already + barrier(); + ACCESS_ONCE(l->pending) = pending; + barrier(); +} + +// and here +static inline u32 cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 newtail) +// API-incompatible with set_pending and the shifting is ugly, so I'd rather +// refactor this one, xchg_tail() and encode_tail() ... another day +{ + struct __qspinlock *l = (void *)lock; + + return (u32)cmpxchg(&l->tail, tail >> _Q_...
2014 May 30
19
[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support
v10->v11: - Use a simple test-and-set unfair lock to simplify the code, but performance may suffer a bit for large guest with many CPUs. - Take out Raghavendra KT's test results as the unfair lock changes may render some of his results invalid. - Add PV support without increasing the size of the core queue node structure. - Other minor changes to address some of the