search for: xchg16

Displaying 20 results from an estimated 28 matches for "xchg16".

Did you mean: xchg
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
...rote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observed by the wait loops cannot change. And because both locked > and pending are now a full byte we can use simple stores for...
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
...rote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with all > the repeated compxchg() operations. > > This in turn allows us to unconditionally acquire; the locked state > as observed by the wait loops cannot change. And because both locked > and pending are now a full byte we can use simple stores for...
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh 4: 50783.509653 8: 146295.875715 16: 332942.964709 4: 51033.341441 8: 146320.656285 16: 332586.355194 And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg loops with unconditional ops (xchg8 and xchg16). And I'd think that 4 CPUs x 4 Nodes would be heavy contention. I'll have another poke tomorrow; including verifying asm tomorrow, need to go sleep now.
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh 4: 50783.509653 8: 146295.875715 16: 332942.964709 4: 51033.341441 8: 146320.656285 16: 332586.355194 And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg loops with unconditional ops (xchg8 and xchg16). And I'd think that 4 CPUs x 4 Nodes would be heavy contention. I'll have another poke tomorrow; including verifying asm tomorrow, need to go sleep now.
2014 Jun 15
0
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
From: Peter Zijlstra <peterz at infradead.org> When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obv...
2014 Apr 17
0
[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS
When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change. And because both locked and pending are now a full byte we can use simple stores for the state transition, obv...
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi, Here are some numbers for my version -- also attached is the test code. I found that booting big machines is tediously slow so I lifted the whole lot to userspace. I measure the cycles spend in arch_spin_lock() + arch_spin_unlock(). The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node (2 socket) Intel Westmere-EP. AMD (ticket) AMD (qspinlock + pending + opt) Local:
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the paravirt crap. The first few patches are taken from Waiman's latest series, but the virt support is completely new. Its primary aim is to not mess up the native code. I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple smp guests. I've not done Xen, but the patch should be
2014 Oct 16
15
[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support
v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10->v11: - Use a
2014 Oct 16
15
[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support
v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo Bonzini. - The pvqspinlock code is largely from my previous version with PeterZ's way of going from queue tail to head and his idea of using callee saved calls to KVM and XEN codes. v10->v11: - Use a
2015 Jan 20
13
[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support
v13->v14: - Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a
2015 Jan 20
13
[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support
v13->v14: - Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit 78bff1c86 from Oleg Nesterov. - Fix the system hang problem when using PV qspinlock in an over-committed guest due to a racing condition in the pv_set_head_in_tail() function. - Increase the MAYHALT_THRESHOLD from 10 to 1024. - Change kick_cpu into a regular function pointer instead of a
2015 Mar 16
19
[PATCH 0/9] qspinlock stuff -v15
Hi Waiman, As promised; here is the paravirt stuff I did during the trip to BOS last week. All the !paravirt patches are more or less the same as before (the only real change is the copyright lines in the first patch). The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work.
2015 Mar 16
19
[PATCH 0/9] qspinlock stuff -v15
Hi Waiman, As promised; here is the paravirt stuff I did during the trip to BOS last week. All the !paravirt patches are more or less the same as before (the only real change is the copyright lines in the first patch). The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work.
2014 Oct 29
15
[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support
v12->v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo
2014 Oct 29
15
[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support
v12->v13: - Change patch 9 to generate separate versions of the queue_spin_lock_slowpath functions for bare metal and PV guest. This reduces the performance impact of the PV code on bare metal systems. v11->v12: - Based on PeterZ's version of the qspinlock patch (https://lkml.org/lkml/2014/6/15/63). - Incorporated many of the review comments from Konrad Wilk and Paolo
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel <david.vrabel at citrix.com>. - Add benchmarking data for 3.19 kernel to compare the performance of a spinlock heavy test