thr3ads.net - search: "xchg16"

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 18

1

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

...rote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observed by the wait loops cannot change. And because both locked > and pending are now a full byte we can use simple stores for...

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 18

1

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

...rote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observed by the wait loops cannot change. And because both locked > and pending are now a full byte we can use simple stores for...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 04

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh 4: 50783.509653 8: 146295.875715 16: 332942.964709 4: 51033.341441 8: 146320.656285 16: 332586.355194 And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg loops with unconditional ops (xchg8 and xchg16). And I'd think that 4 CPUs x 4 Nodes would be heavy contention. I'll have another poke tomorrow; including verifying asm tomorrow, need to go sleep now.

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 04

1

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh 4: 50783.509653 8: 146295.875715 16: 332942.964709 4: 51033.341441 8: 146320.656285 16: 332586.355194 And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg loops with unconditional ops (xchg8 and xchg16). And I'd think that 4 CPUs x 4 Nodes would be heavy contention. I'll have another poke tomorrow; including verifying asm tomorrow, need to go sleep now.

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

2014 Jun 15

0

[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS

From: Peter Zijlstra <peterz at infradead.org> When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obv...

[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014 Apr 17

0

[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obv...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:

[PATCH 00/11] qspinlock with paravirt support

2014 Jun 15

28

[PATCH 00/11] qspinlock with paravirt support

Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be

[PATCH 00/11] qspinlock with paravirt support

2014 Jun 15

28

[PATCH 00/11] qspinlock with paravirt support

Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be

[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Oct 16

15

[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support

v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10->v11: - Use a

[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Oct 16

15

[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support

v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10->v11: - Use a

[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support

2015 Jan 20

13

[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support

v13->v14: - Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a

[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support

2015 Jan 20

13

[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support

v13->v14: - Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 16

19

[PATCH 0/9] qspinlock stuff -v15

Hi Waiman, As promised; here is the paravirt stuff I did during the trip to BOS last week. All the !paravirt patches are more or less the same as before (the only real change is the copyright lines in the first patch). The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work.

[PATCH 0/9] qspinlock stuff -v15

2015 Mar 16

19

[PATCH 0/9] qspinlock stuff -v15

Hi Waiman, As promised; here is the paravirt stuff I did during the trip to BOS last week. All the !paravirt patches are more or less the same as before (the only real change is the copyright lines in the first patch). The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work.

[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Oct 29

15

[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support

v12->v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo

[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support

2014 Oct 29

15

[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support

v12->v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015 Apr 07

18

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test

search for: xchg16