thr3ads.net - search: "queue_spin

[PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

2014 Mar 02

1

[PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

...aiman Long wrote: > > +#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS > + > +/* > + * x86-64 specific queue spinlock union structure > + */ > +union arch_qspinlock { > + struct qspinlock slock; > + u8 lock; /* Lock bit */ > +}; And this enables the optimized version of queue_spin_setlock(). But why does it check ACCESS_ONCE(qlock->lock) == 0 ? This is called right after queue_get_lock_qcode() returns 0, this locked should be likely unlocked. Oleg.

[PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

2014 Mar 02

1

[PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

...aiman Long wrote: > > +#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS > + > +/* > + * x86-64 specific queue spinlock union structure > + */ > +union arch_qspinlock { > + struct qspinlock slock; > + u8 lock; /* Lock bit */ > +}; And this enables the optimized version of queue_spin_setlock(). But why does it check ACCESS_ONCE(qlock->lock) == 0 ? This is called right after queue_get_lock_qcode() returns 0, this locked should be likely unlocked. Oleg.

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...or 3/8which adds the optimized version of queue_get_lock_qcode(), so perhaps this "retval < 0" block can go into 3/8 as well. > + else if (qcode != my_qcode) { > + /* > + * Just get the lock with other spinners waiting > + * in the queue. > + */ > + if (queue_spin_setlock(lock)) > + goto notify_next; OTOH, at least the generic (non-optimized) version of queue_spin_setlock() could probably accept "qcode" and avoid atomic_read() + _QSPINLOCK_LOCKED check. But once again, please feel free to ignore. Oleg.

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...or 3/8which adds the optimized version of queue_get_lock_qcode(), so perhaps this "retval < 0" block can go into 3/8 as well. > + else if (qcode != my_qcode) { > + /* > + * Just get the lock with other spinners waiting > + * in the queue. > + */ > + if (queue_spin_setlock(lock)) > + goto notify_next; OTOH, at least the generic (non-optimized) version of queue_spin_setlock() could probably accept "qcode" and avoid atomic_read() + _QSPINLOCK_LOCKED check. But once again, please feel free to ignore. Oleg.

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 02

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On 02/26, Waiman Long wrote: > > @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) > int qlcode = atomic_read(lock->qlcode); > > if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode, > - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode)) > + qlcode, code|_QSPINLOCK_LOCKED) == qlcode)) Hmm. didn't r...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 02

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On 02/26, Waiman Long wrote: > > @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) > int qlcode = atomic_read(lock->qlcode); > > if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode, > - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode)) > + qlcode, code|_QSPINLOCK_LOCKED) == qlcode)) Hmm. didn't r...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...+ * o wait - the waiting byte * + * o qcode - the queue node code * + * o lock_wait - the combined lock and waiting bytes * * * ************************************************************************ */ @@ -129,6 +132,176 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) return 1; return 0; } + +#ifndef _Q_MANY_CPUS +/* + * With less than 16K CPUs, the following optimizations are possible with + * the x86 architecture: + * 1) The 2nd byte of the 32-bit lock word can be used as a pending bit + * for waiting lock acquirer so that it...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...+ * o wait - the waiting byte * + * o qcode - the queue node code * + * o lock_wait - the combined lock and waiting bytes * * * ************************************************************************ */ @@ -129,6 +132,176 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) return 1; return 0; } + +#ifndef _Q_MANY_CPUS +/* + * With less than 16K CPUs, the following optimizations are possible with + * the x86 architecture: + * 1) The 2nd byte of the 32-bit lock word can be used as a pending bit + * for waiting lock acquirer so that it...

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 26

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...dual * + * fields of the qspinlock structure, including: * + * o slock - the qspinlock structure * + * o lock - the lock byte * + * * + ************************************************************************ + */ +#ifdef _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS +/** + * queue_spin_setlock - try to acquire the lock by setting the lock bit + * @lock: Pointer to queue spinlock structure + * Return: 1 if lock bit set successfully, 0 if failed + */ +static __always_inline int queue_spin_setlock(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + i...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 27

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...dual * + * fields of the qspinlock structure, including: * + * o slock - the qspinlock structure * + * o lock - the lock byte * + * * + ************************************************************************ + */ +#ifdef _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS +/** + * queue_spin_setlock - try to acquire the lock by setting the lock bit + * @lock: Pointer to queue spinlock structure + * Return: 1 if lock bit set successfully, 0 if failed + */ +static __always_inline int queue_spin_setlock(struct qspinlock *lock) +{ + union arch_qspinlock *qlock = (union arch_qspinlock *)lock; + + i...

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

0

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...spinlock *lock, u32 *qcode, u32 mycode) u32 qlcode = (u32)atomic_read(&lock->qlcode); *qcode = qlcode >> _QCODE_OFFSET; - return qlcode & _QSPINLOCK_LOCKED; + return qlcode & _QSPINLOCK_LWMASK; } #endif /* _Q_MANY_CPUS */ @@ -185,7 +307,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) { union arch_qspinlock *qlock = (union arch_qspinlock *)lock; - return cmpxchg(&qlock->lock, 0, _QSPINLOCK_LOCKED) == 0; + return cmpxchg(&qlock->lock_wait, 0, _QSPINLOCK_LOCKED) == 0; } #else /* _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS */ /* @@ -214,6 +33...

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 19

15

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

v6->v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock & PV code on overcommitted KVM guests to collect performance data. v5->v6: - Change the optimized

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 19

15

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

v6->v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock & PV code on overcommitted KVM guests to collect performance data. v5->v6: - Change the optimized

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 26

22

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 26

22

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

search for: queue_spin_setlock