thr3ads.net - search: "arch_mutex_cpu

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 17

2

[PATCH v9 03/19] qspinlock: Add pending bit

...> + * we won the trylock > + */ > + if (new == _Q_LOCKED_VAL) > + return 1; > + > + /* > + * we're pending, wait for the owner to go away. > + * > + * *,1,1 -> *,1,0 > + */ > + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) > + arch_mutex_cpu_relax(); That was a cpu_relax(). > + > + /* > + * take ownership and clear the pending bit. > + * > + * *,1,0 -> *,0,1 > + */ > + for (;;) { > + new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; > + > + old = atomic_cmpxchg(&lock->val, val, new); > +...

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 17

2

[PATCH v9 03/19] qspinlock: Add pending bit

...> + * we won the trylock > + */ > + if (new == _Q_LOCKED_VAL) > + return 1; > + > + /* > + * we're pending, wait for the owner to go away. > + * > + * *,1,1 -> *,1,0 > + */ > + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) > + arch_mutex_cpu_relax(); That was a cpu_relax(). > + > + /* > + * take ownership and clear the pending bit. > + * > + * *,1,0 -> *,0,1 > + */ > + for (;;) { > + new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; > + > + old = atomic_cmpxchg(&lock->val, val, new); > +...

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 18

1

[PATCH v9 03/19] qspinlock: Add pending bit

On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote: > >>+ while ((val = atomic_read(&lock->val))& _Q_LOCKED_MASK) > >>+ arch_mutex_cpu_relax(); > >That was a cpu_relax(). > > Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86. Yeah, so why bother typing more? Let the s390 people sort that if/when they try and make this work for them.

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 18

1

[PATCH v9 03/19] qspinlock: Add pending bit

On Thu, Apr 17, 2014 at 05:20:31PM -0400, Waiman Long wrote: > >>+ while ((val = atomic_read(&lock->val))& _Q_LOCKED_MASK) > >>+ arch_mutex_cpu_relax(); > >That was a cpu_relax(). > > Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86. Yeah, so why bother typing more? Let the s390 people sort that if/when they try and make this work for them.

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

0

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

.../* + * Wait bit was set already, try again after some delay + * as the waiter will probably get the lock & clear + * the wait bit. + * + * There are 2 cpu_relax() calls to make sure that + * the wait is longer than that of the + * smp_load_acquire() loop below. + */ + arch_mutex_cpu_relax(); + arch_mutex_cpu_relax(); + qsval = atomic_read(&lock->qlcode); + continue; + } + + /* + * Now wait until the lock bit is cleared + */ + while (smp_load_acquire(&qlock->qlcode) & _QSPINLOCK_LOCKED) + arch_mutex_cpu_relax(); + + /* + * Set the lock bit & cl...

[PATCH v10 09/19] qspinlock: Prepare for unfair lock support

2014 May 07

0

[PATCH v10 09/19] qspinlock: Prepare for unfair lock support

...NULL; /* @@ -391,7 +394,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) prev = decode_tail(old); ACCESS_ONCE(prev->mcs.next) = (struct mcs_spinlock *)node; - while (!smp_load_acquire(&node->mcs.locked)) + while (!smp_load_acquire(&node->qhead)) arch_mutex_cpu_relax(); } @@ -403,6 +406,7 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * *,x,y -> *,0,0 */ +retry_queue_wait: while ((val = smp_load_acquire(&lock->val.counter)) & _Q_LOCKED_PENDING_MASK) arch_mutex_cpu_relax(); @@ -419,12 +423,20 @@ vo...

[PATCH v9 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014 Apr 17

0

[PATCH v9 07/19] qspinlock: Use a simple write to grab the lock, if applicable

...ere because the get_qlock() + * function below may not be a full memory barrier. * * *,x,y -> *,0,0 */ - while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK) + while ((val = smp_load_acquire(&lock->val.counter)) + & _Q_LOCKED_PENDING_MASK) arch_mutex_cpu_relax(); /* @@ -378,15 +403,19 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * n,0,0 -> 0,0,1 : lock, uncontended * *,0,0 -> *,0,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail), + * clear the tail code and grab th...

[PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

2014 May 07

0

[PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable

...ere because the get_qlock() + * function below may not be a full memory barrier. * * *,x,y -> *,0,0 */ - while ((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK) + while ((val = smp_load_acquire(&lock->val.counter)) + & _Q_LOCKED_PENDING_MASK) arch_mutex_cpu_relax(); /* @@ -377,15 +402,19 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) * * n,0,0 -> 0,0,1 : lock, uncontended * *,0,0 -> *,0,1 : lock, contended + * + * If the queue head is the only one in the queue (lock value == tail), + * clear the tail code and grab th...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

Forgot to ask... On 02/26, Waiman Long wrote: > > +notify_next: > + /* > + * Wait, if needed, until the next one in queue set up the next field > + */ > + while (!(next = ACCESS_ONCE(node->next))) > + arch_mutex_cpu_relax(); > + /* > + * The next one in queue is now at the head > + */ > + smp_store_release(&next->wait, false); Do we really need smp_store_release()? It seems that we can rely on the control dependency here. And afaics there is no need to serialise this store with other changes in...

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 17

0

[PATCH v9 03/19] qspinlock: Add pending bit

...gt;> + if (new == _Q_LOCKED_VAL) >> + return 1; >> + >> + /* >> + * we're pending, wait for the owner to go away. >> + * >> + * *,1,1 -> *,1,0 >> + */ >> + while ((val = atomic_read(&lock->val))& _Q_LOCKED_MASK) >> + arch_mutex_cpu_relax(); > That was a cpu_relax(). Yes, but arch_mutex_cpu_relax() is the same as cpu_relax() for x86. -Longman

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

Forgot to ask... On 02/26, Waiman Long wrote: > > +notify_next: > + /* > + * Wait, if needed, until the next one in queue set up the next field > + */ > + while (!(next = ACCESS_ONCE(node->next))) > + arch_mutex_cpu_relax(); > + /* > + * The next one in queue is now at the head > + */ > + smp_store_release(&next->wait, false); Do we really need smp_store_release()? It seems that we can rely on the control dependency here. And afaics there is no need to serialise this store with other changes in...

[PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

2014 May 07

0

[PATCH v10 08/19] qspinlock: Make a new qnode structure to support virtualization

...lock, u32 val) */ if (old & _Q_TAIL_MASK) { prev = decode_tail(old); - ACCESS_ONCE(prev->next) = node; + ACCESS_ONCE(prev->mcs.next) = (struct mcs_spinlock *)node; - arch_mcs_spin_lock_contended(&node->locked); + while (!smp_load_acquire(&node->mcs.locked)) + arch_mutex_cpu_relax(); } /* @@ -422,15 +432,15 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) /* * contended path; wait for next, release. */ - while (!(next = ACCESS_ONCE(node->next))) + while (!(next = (struct qnode *)ACCESS_ONCE(node->mcs.next))) arch_mutex_cpu_relax(); -...

[RFC 08/07] qspinlock: integrate pending bit into queue

2014 May 21

0

[RFC 08/07] qspinlock: integrate pending bit into queue

...a get_pending(lock, &val) helper + while ((val = smp_load_acquire(&lock->val.counter)) & _Q_PENDING_MASK) + // would longer body ease cacheline contention? + // would it be better to use monitor/mwait instead? + // (we can tolerate some delay because we aren't pending ...) arch_mutex_cpu_relax(); /* - * claim the lock: + * The pending bit is free, take it. * - * n,0,0 -> 0,0,1 : lock, uncontended - * *,0,0 -> *,0,1 : lock, contended + * *,0,* -> *,1,* + */ + // might add &val param and do |= _Q_PENDING_VAL when refactoring ... + set_pending(lock, 1); + + /* + *...

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 17

0

[PATCH v9 03/19] qspinlock: Add pending bit

...(old == val) + break; + + *pval = val = old; + } + + /* + * we won the trylock + */ + if (new == _Q_LOCKED_VAL) + return 1; + + /* + * we're pending, wait for the owner to go away. + * + * *,1,1 -> *,1,0 + */ + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + arch_mutex_cpu_relax(); + + /* + * take ownership and clear the pending bit. + * + * *,1,0 -> *,0,1 + */ + for (;;) { + new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + return 1; +} + /** * queue_sp...

[PATCH v10 03/19] qspinlock: Add pending bit

2014 May 07

0

[PATCH v10 03/19] qspinlock: Add pending bit

...(old == val) + break; + + *pval = val = old; + } + + /* + * we won the trylock + */ + if (new == _Q_LOCKED_VAL) + return 1; + + /* + * we're pending, wait for the owner to go away. + * + * *,1,1 -> *,1,0 + */ + while ((val = atomic_read(&lock->val)) & _Q_LOCKED_MASK) + arch_mutex_cpu_relax(); + + /* + * take ownership and clear the pending bit. + * + * *,1,0 -> *,0,1 + */ + for (;;) { + new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; + + old = atomic_cmpxchg(&lock->val, val, new); + if (old == val) + break; + + val = old; + } + return 1; +} + /** * queue_sp...

[PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014 Apr 17

33

[PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

v8->v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org - Break the more complex patches into smaller ones to ease review effort. - Fix a racing condition in the PV qspinlock code. v7->v8: - Remove one unneeded atomic operation from the slowpath, thus improving

[PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014 Apr 17

33

[PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

v8->v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org - Break the more complex patches into smaller ones to ease review effort. - Fix a racing condition in the PV qspinlock code. v7->v8: - Remove one unneeded atomic operation from the slowpath, thus improving

[PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014 May 07

32

[PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

v9->v10: - Make some minor changes to qspinlock.c to accommodate review feedback. - Change author to PeterZ for 2 of the patches. - Include Raghavendra KT's test results in patch 18. v8->v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org - Break the more complex

[PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014 May 07

32

[PATCH v10 00/19] qspinlock: a 4-byte queue spinlock with PV support

v9->v10: - Make some minor changes to qspinlock.c to accommodate review feedback. - Change author to PeterZ for 2 of the patches. - Include Raghavendra KT's test results in patch 18. v8->v9: - Integrate PeterZ's version of the queue spinlock patch with some modification: http://lkml.kernel.org/r/20140310154236.038181843 at infradead.org - Break the more complex

[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support

2014 May 30

19

[PATCH v11 00/16] qspinlock: a 4-byte queue spinlock with PV support

v10->v11: - Use a simple test-and-set unfair lock to simplify the code, but performance may suffer a bit for large guest with many CPUs. - Take out Raghavendra KT's test results as the unfair lock changes may render some of his results invalid. - Add PV support without increasing the size of the core queue node structure. - Other minor changes to address some of the

search for: arch_mutex_cpu_relax