thr3ads.net - search: "qlcode"

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 02

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On 02/26, Waiman Long wrote: > > @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) > int qlcode = atomic_read(lock->qlcode); > > if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode, > - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode)) > + qlcode, code|_QSPINLOCK_LOCKED) == qlcode)) Hmm. didn't read the patch, but this change looks like...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 02

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On 02/26, Waiman Long wrote: > > @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock) > int qlcode = atomic_read(lock->qlcode); > > if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode, > - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode)) > + qlcode, code|_QSPINLOCK_LOCKED) == qlcode)) Hmm. didn't read the patch, but this change looks like...

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

0

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...lude/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -21,9 +21,10 @@ union arch_qspinlock { struct qspinlock slock; struct { u8 lock; /* Lock bit */ - u8 reserved; + u8 wait; /* Waiting bit */ u16 qcode; /* Queue code */ }; + u16 lock_wait; /* Lock and wait bits */ u32 qlcode; /* Complete lock word */ }; diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 52d3580..0030fad 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -112,6 +112,8 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode_set, qnset) = { { { 0 } }, 0 }; *...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...eue_spin_unlock queue_spin_unlock diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index df981d0..3a02a9e 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -48,7 +48,13 @@ typedef struct qspinlock { atomic_t qlcode; /* Lock + queue code */ } arch_spinlock_t; -#define _QCODE_OFFSET 8 +#if CONFIG_NR_CPUS >= (1 << 14) +# define _Q_MANY_CPUS +# define _QCODE_OFFSET 8 +#else +# define _QCODE_OFFSET 16 +#endif + #define _QSPINLOCK_LOCKED 1U #define _QSPINLOCK_LOCK_MASK 0xff diff --git a/kernel/lock...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...eue_spin_unlock queue_spin_unlock diff --git a/include/asm-generic/qspinlock_types.h b/include/asm-generic/qspinlock_types.h index df981d0..3a02a9e 100644 --- a/include/asm-generic/qspinlock_types.h +++ b/include/asm-generic/qspinlock_types.h @@ -48,7 +48,13 @@ typedef struct qspinlock { atomic_t qlcode; /* Lock + queue code */ } arch_spinlock_t; -#define _QCODE_OFFSET 8 +#if CONFIG_NR_CPUS >= (1 << 14) +# define _Q_MANY_CPUS +# define _QCODE_OFFSET 8 +#else +# define _QCODE_OFFSET 16 +#endif + #define _QSPINLOCK_LOCKED 1U #define _QSPINLOCK_LOCK_MASK 0xff diff --git a/kernel/lock...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 04

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...4434 9876 7 6304 5176 11901 8 7736 5955 14551 Below is the code that I used: static inline u32 queue_code_xchg(struct qspinlock *lock, u32 *ocode, u32 ncode) { while (true) { u32 qlcode = atomic_read(&lock->qlcode); if (qlcode == 0) { /* * Try to get the lock */ if (atomic_cmpxchg(&lock->qlcode, 0, _QSPINL...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 26

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...struct qspinlock *lock, int qsval); + +/** + * queue_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queue spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queue_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(&lock->qlcode) & _QSPINLOCK_LOCKED; +} + +/** + * queue_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queue spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + */ +static __always_inline int queue_spin_value_unlocked(struct qspinlock lock) +{ + return !(atomic_read(&l...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 27

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...struct qspinlock *lock, int qsval); + +/** + * queue_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queue spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queue_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(&lock->qlcode) & _QSPINLOCK_LOCKED; +} + +/** + * queue_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queue spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + */ +static __always_inline int queue_spin_value_unlocked(struct qspinlock lock) +{ + return !(atomic_read(&l...

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014 Apr 02

0

[PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

...struct qspinlock *lock, int qsval); + +/** + * queue_spin_is_locked - is the spinlock locked? + * @lock: Pointer to queue spinlock structure + * Return: 1 if it is locked, 0 otherwise + */ +static __always_inline int queue_spin_is_locked(struct qspinlock *lock) +{ + return atomic_read(&lock->qlcode) & _QLOCK_LOCK_MASK; +} + +/** + * queue_spin_value_unlocked - is the spinlock structure unlocked? + * @lock: queue spinlock structure + * Return: 1 if it is unlocked, 0 otherwise + */ +static __always_inline int queue_spin_value_unlocked(struct qspinlock lock) +{ + return !(atomic_read(&lo...

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...And I am wondering how much this "qsval >> _QCODE_OFFSET" check can help. Note that this is the only usage of this arg, perhaps it would be better to simply remove it and shrink the caller's code a bit? It is also used in 3/8, but we can read the "fresh" value of ->qlcode (trylock does this anyway), and perhaps it can actually help if it is already unlocked. > + prev_qcode = atomic_xchg(&lock->qlcode, my_qcode); > + /* > + * It is possible that we may accidentally steal the lock. If this is > + * the case, we need to either release it if not th...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Mar 02

1

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...And I am wondering how much this "qsval >> _QCODE_OFFSET" check can help. Note that this is the only usage of this arg, perhaps it would be better to simply remove it and shrink the caller's code a bit? It is also used in 3/8, but we can read the "fresh" value of ->qlcode (trylock does this anyway), and perhaps it can actually help if it is already unlocked. > + prev_qcode = atomic_xchg(&lock->qlcode, my_qcode); > + /* > + * It is possible that we may accidentally steal the lock. If this is > + * the case, we need to either release it if not th...

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 19

15

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

v6->v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock & PV code on overcommitted KVM guests to collect performance data. v5->v6: - Change the optimized

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 19

15

[PATCH v7 00/11] qspinlock: a 4-byte queue spinlock with PV support

v6->v7: - Remove an atomic operation from the 2-task contending code - Shorten the names of some macros - Make the queue waiter to attempt to steal lock when unfair lock is enabled. - Remove lock holder kick from the PV code and fix a race condition - Run the unfair lock & PV code on overcommitted KVM guests to collect performance data. v5->v6: - Change the optimized

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 26

22

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

search for: qlcode