thr3ads.net - search: "lock

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

2

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...k->lock_wait) = _QSPINLOCK_LOCKED; > + * > + * It is not currently clear why this happens. A workaround > + * is to use atomic instruction to store the new value. > + */ > + { > + u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED); > + BUG_ON(lw != _QSPINLOCK_WAITING); > + } > + return 1; > It was found that when I used a direct memory store instead of an atomic op, the following kernel crash might happen at filesystem dismount time: Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.14.0-rc6-qlock on an x86_64 h11-kvm20 login: [ 1529.934047] B...

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

2

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...k->lock_wait) = _QSPINLOCK_LOCKED; > + * > + * It is not currently clear why this happens. A workaround > + * is to use atomic instruction to store the new value. > + */ > + { > + u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED); > + BUG_ON(lw != _QSPINLOCK_WAITING); > + } > + return 1; > It was found that when I used a direct memory store instead of an atomic op, the following kernel crash might happen at filesystem dismount time: Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.14.0-rc6-qlock on an x86_64 h11-kvm20 login: [ 1529.934047] B...

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 12

0

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...tes. + * 2) The 2nd byte of the 32-bit lock word can be used as a pending bit + * for waiting lock acquirer so that it won't need to go through the + * MCS style locking queuing which has a higher overhead. */ +#define _QSPINLOCK_WAIT_SHIFT 8 /* Waiting bit position */ +#define _QSPINLOCK_WAITING (1 << _QSPINLOCK_WAIT_SHIFT) +/* Masks for lock & wait bits */ +#define _QSPINLOCK_LWMASK (_QSPINLOCK_WAITING | _QSPINLOCK_LOCKED) + #define queue_encode_qcode(cpu, idx) (((cpu) + 1) << 2 | (idx)) +#define queue_get_qcode(lock) (atomic_read(&(lock)->qlcode) >> _QCODE...

[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015 Feb 16

1

[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

...ts); > + u8 old = READ_ONCE(zero_stats); > if (unlikely(old)) { > ret = cmpxchg(&zero_stats, old, 0); > /* This ensures only one fellow resets the stat */ > @@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) > struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting); > int cpu = smp_processor_id(); > u64 start; > + __ticket_t head; > unsigned long flags; > > /* If kicker interrupts not initialized yet, just spin */ > @@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock *...

[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015 Feb 16

1

[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

...ts); > + u8 old = READ_ONCE(zero_stats); > if (unlikely(old)) { > ret = cmpxchg(&zero_stats, old, 0); > /* This ensures only one fellow resets the stat */ > @@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) > struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting); > int cpu = smp_processor_id(); > u64 start; > + __ticket_t head; > unsigned long flags; > > /* If kicker interrupts not initialized yet, just spin */ > @@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock *...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

2

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

..._qspinlock *qlock = (union arch_qspinlock *)lock; > + u16 old; > + > + /* > + * Fall into the quick spinning code path only if no one is waiting > + * or the lock is available. > + */ > + if (unlikely((qsval != _QSPINLOCK_LOCKED) && > + (qsval != _QSPINLOCK_WAITING))) > + return 0; > + > + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > + > + if (old == 0) { > + /* > + * Got the lock, can clear the waiting bit now > + */ > + smp_u8_store_release(&qlock->wait, 0); So we just did an atomic...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

2

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

..._qspinlock *qlock = (union arch_qspinlock *)lock; > + u16 old; > + > + /* > + * Fall into the quick spinning code path only if no one is waiting > + * or the lock is available. > + */ > + if (unlikely((qsval != _QSPINLOCK_LOCKED) && > + (qsval != _QSPINLOCK_WAITING))) > + return 0; > + > + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > + > + if (old == 0) { > + /* > + * Got the lock, can clear the waiting bit now > + */ > + smp_u8_store_release(&qlock->wait, 0); So we just did an atomic...

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

2014 Mar 13

0

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

...K_LOCKED; > >+ * > >+ * It is not currently clear why this happens. A workaround > >+ * is to use atomic instruction to store the new value. > >+ */ > >+ { > >+ u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED); > >+ BUG_ON(lw != _QSPINLOCK_WAITING); > >+ } > It was found that when I used a direct memory store instead of an atomic op, > the following kernel crash might happen at filesystem dismount time: > > [ 1529.936714] Call Trace: > [ 1529.936714] [<ffffffff811c2d03>] d_walk+0xc3/0x260 > [ 1529.936714] [...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...t + * for waiting lock acquirer so that it won't need to go through the + * MCS style locking queuing which has a higher overhead. + * 2) The 16-bit queue code can be accessed or modified directly as a + * 16-bit short value without disturbing the first 2 bytes. + */ +#define _QSPINLOCK_WAITING 0x100U /* Waiting bit in 2nd byte */ +#define _QSPINLOCK_LWMASK 0xffff /* Mask for lock & wait bits */ + +#define queue_encode_qcode(cpu, idx) (((cpu) + 1) << 2 | (idx)) + +#define queue_spin_trylock_quick queue_spin_trylock_quick +/** + * queue_spin_trylock_quick - fast spinning on the...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...t + * for waiting lock acquirer so that it won't need to go through the + * MCS style locking queuing which has a higher overhead. + * 2) The 16-bit queue code can be accessed or modified directly as a + * 16-bit short value without disturbing the first 2 bytes. + */ +#define _QSPINLOCK_WAITING 0x100U /* Waiting bit in 2nd byte */ +#define _QSPINLOCK_LWMASK 0xffff /* Mask for lock & wait bits */ + +#define queue_encode_qcode(cpu, idx) (((cpu) + 1) << 2 | (idx)) + +#define queue_spin_trylock_quick queue_spin_trylock_quick +/** + * queue_spin_trylock_quick - fast spinning on the...

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014 Feb 27

0

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

...ut this may result in some ping ponging? Actually, I think the qspinlock can work roughly the same as the pvticketlock, using the same lock_spinning and unlock_lock hooks. The x86-specific codepath can use bit 1 in the ->wait byte as "I have halted, please kick me". value = _QSPINLOCK_WAITING; i = 0; do cpu_relax(); while (ACCESS_ONCE(slock->lock) && i++ < BUSY_WAIT); if (ACCESS_ONCE(slock->lock)) { value |= _QSPINLOCK_HALTED; xchg(&slock->wait, value >> 8); if (ACCESS_ONCE(slock->lock)) { ... call lock_spinning hook ... } } /* * Se...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

..._qspinlock *)lock; >> + u16 old; >> + >> + /* >> + * Fall into the quick spinning code path only if no one is waiting >> + * or the lock is available. >> + */ >> + if (unlikely((qsval != _QSPINLOCK_LOCKED)&& >> + (qsval != _QSPINLOCK_WAITING))) >> + return 0; >> + >> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); >> + >> + if (old == 0) { >> + /* >> + * Got the lock, can clear the waiting bit now >> + */ >> + smp_u8_store_release(&qlock->...

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014 Feb 27

3

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

On 02/27/2014 08:15 PM, Paolo Bonzini wrote: [...] >> But neither of the VCPUs being kicked here are halted -- they're either >> running or runnable (descheduled by the hypervisor). > > /me actually looks at Waiman's code... > > Right, this is really different from pvticketlocks, where the *unlock* > primitive wakes up a sleeping VCPU. It is more similar to PLE

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014 Feb 27

3

[PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

On 02/27/2014 08:15 PM, Paolo Bonzini wrote: [...] >> But neither of the VCPUs being kicked here are halted -- they're either >> running or runnable (descheduled by the hypervisor). > > /me actually looks at Waiman's code... > > Right, this is really different from pvticketlocks, where the *unlock* > primitive wakes up a sleeping VCPU. It is more similar to PLE

[PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015 Feb 15

7

[PATCH V5] x86 spinlock: Fix memory corruption on completing completions

...ock->tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } @@ -803,8 +805,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) add_stats(RELEASED_SLOW, 1); for_each_cpu(cpu, &waiting_cpus) { const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu); - if (ACCESS_ONCE(w->lock) == lock && - ACCESS_ONCE(w->want) == ticket) { + if (READ_ONCE(w->lock) == lock && + READ_ONCE(w->want) == ticket) { add_stats(RELEASED_SLOW_KICKED, 1); kvm_kick_cpu(cpu); break; di...

[PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015 Feb 15

7

[PATCH V5] x86 spinlock: Fix memory corruption on completing completions

...ock->tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } @@ -803,8 +805,8 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) add_stats(RELEASED_SLOW, 1); for_each_cpu(cpu, &waiting_cpus) { const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu); - if (ACCESS_ONCE(w->lock) == lock && - ACCESS_ONCE(w->want) == ticket) { + if (READ_ONCE(w->lock) == lock && + READ_ONCE(w->want) == ticket) { add_stats(RELEASED_SLOW_KICKED, 1); kvm_kick_cpu(cpu); break; di...

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Mar 12

17

[PATCH v6 00/11] qspinlock: a 4-byte queue spinlock with PV support

v5->v6: - Change the optimized 2-task contending code to make it fairer at the expense of a bit of performance. - Add a patch to support unfair queue spinlock for Xen. - Modify the PV qspinlock code to follow what was done in the PV ticketlock. - Add performance data for the unfair lock as well as the PV support code. v4->v5: - Move the optimized 2-task contending code to the

[PATCH V12 0/14] Paravirtualized ticket spinlocks

2013 Aug 06

16

[PATCH V12 0/14] Paravirtualized ticket spinlocks

This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. The series provides implementation for both Xen and KVM. The current set of patches are for Xen/x86 spinlock/KVM guest side, to be included against -tip. I 'll be sending a separate patchset for KVM host based on kvm tree. Please note I have added the below performance result

[PATCH V12 0/14] Paravirtualized ticket spinlocks

2013 Aug 06

16

[PATCH V12 0/14] Paravirtualized ticket spinlocks

This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. The series provides implementation for both Xen and KVM. The current set of patches are for Xen/x86 spinlock/KVM guest side, to be included against -tip. I 'll be sending a separate patchset for KVM host based on kvm tree. Please note I have added the below performance result

search for: lock_waiting