Displaying 20 results from an estimated 28 matches for "xchg16".
Did you mean:
xchg
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
...rote:
> From: Peter Zijlstra <peterz at infradead.org>
>
> When we allow for a max NR_CPUS < 2^14 we can optimize the pending
> wait-acquire and the xchg_tail() operations.
>
> By growing the pending bit to a byte, we reduce the tail to 16bit.
> This means we can use xchg16 for the tail part and do away with all
> the repeated compxchg() operations.
>
> This in turn allows us to unconditionally acquire; the locked state
> as observed by the wait loops cannot change. And because both locked
> and pending are now a full byte we can use simple stores for...
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
...rote:
> From: Peter Zijlstra <peterz at infradead.org>
>
> When we allow for a max NR_CPUS < 2^14 we can optimize the pending
> wait-acquire and the xchg_tail() operations.
>
> By growing the pending bit to a byte, we reduce the tail to 16bit.
> This means we can use xchg16 for the tail part and do away with all
> the repeated compxchg() operations.
>
> This in turn allows us to unconditionally acquire; the locked state
> as observed by the wait loops cannot change. And because both locked
> and pending are now a full byte we can use simple stores for...
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh
4: 50783.509653
8: 146295.875715
16: 332942.964709
4: 51033.341441
8: 146320.656285
16: 332586.355194
And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg
loops with unconditional ops (xchg8 and xchg16).
And I'd think that 4 CPUs x 4 Nodes would be heavy contention.
I'll have another poke tomorrow; including verifying asm tomorrow, need
to go sleep now.
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh
4: 50783.509653
8: 146295.875715
16: 332942.964709
4: 51033.341441
8: 146320.656285
16: 332586.355194
And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg
loops with unconditional ops (xchg8 and xchg16).
And I'd think that 4 CPUs x 4 Nodes would be heavy contention.
I'll have another poke tomorrow; including verifying asm tomorrow, need
to go sleep now.
2014 Jun 15
0
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
From: Peter Zijlstra <peterz at infradead.org>
When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.
By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.
This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obv...
2014 Apr 17
0
[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS
When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.
By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.
This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obv...
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local:
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local:
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the
paravirt crap.
The first few patches are taken from Waiman's latest series, but the virt
support is completely new. Its primary aim is to not mess up the native code.
I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple
smp guests. I've not done Xen, but the patch should be
2014 Jun 15
28
[PATCH 00/11] qspinlock with paravirt support
Since Waiman seems incapable of doing simple things; here's my take on the
paravirt crap.
The first few patches are taken from Waiman's latest series, but the virt
support is completely new. Its primary aim is to not mess up the native code.
I've not stress tested it, but the virt and paravirt (kvm) cases boot on simple
smp guests. I've not done Xen, but the patch should be
2014 Oct 16
15
[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support
v11->v12:
- Based on PeterZ's version of the qspinlock patch
(https://lkml.org/lkml/2014/6/15/63).
- Incorporated many of the review comments from Konrad Wilk and
Paolo Bonzini.
- The pvqspinlock code is largely from my previous version with
PeterZ's way of going from queue tail to head and his idea of
using callee saved calls to KVM and XEN codes.
v10->v11:
- Use a
2014 Oct 16
15
[PATCH v12 00/11] qspinlock: a 4-byte queue spinlock with PV support
v11->v12:
- Based on PeterZ's version of the qspinlock patch
(https://lkml.org/lkml/2014/6/15/63).
- Incorporated many of the review comments from Konrad Wilk and
Paolo Bonzini.
- The pvqspinlock code is largely from my previous version with
PeterZ's way of going from queue tail to head and his idea of
using callee saved calls to KVM and XEN codes.
v10->v11:
- Use a
2015 Jan 20
13
[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support
v13->v14:
- Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit
78bff1c86 from Oleg Nesterov.
- Fix the system hang problem when using PV qspinlock in an
over-committed guest due to a racing condition in the
pv_set_head_in_tail() function.
- Increase the MAYHALT_THRESHOLD from 10 to 1024.
- Change kick_cpu into a regular function pointer instead of a
2015 Jan 20
13
[PATCH v14 00/11] qspinlock: a 4-byte queue spinlock with PV support
v13->v14:
- Patches 1 & 2: Add queue_spin_unlock_wait() to accommodate commit
78bff1c86 from Oleg Nesterov.
- Fix the system hang problem when using PV qspinlock in an
over-committed guest due to a racing condition in the
pv_set_head_in_tail() function.
- Increase the MAYHALT_THRESHOLD from 10 to 1024.
- Change kick_cpu into a regular function pointer instead of a
2015 Mar 16
19
[PATCH 0/9] qspinlock stuff -v15
Hi Waiman,
As promised; here is the paravirt stuff I did during the trip to BOS last week.
All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).
The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.
2015 Mar 16
19
[PATCH 0/9] qspinlock stuff -v15
Hi Waiman,
As promised; here is the paravirt stuff I did during the trip to BOS last week.
All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).
The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.
2014 Oct 29
15
[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support
v12->v13:
- Change patch 9 to generate separate versions of the
queue_spin_lock_slowpath functions for bare metal and PV guest. This
reduces the performance impact of the PV code on bare metal systems.
v11->v12:
- Based on PeterZ's version of the qspinlock patch
(https://lkml.org/lkml/2014/6/15/63).
- Incorporated many of the review comments from Konrad Wilk and
Paolo
2014 Oct 29
15
[PATCH v13 00/11] qspinlock: a 4-byte queue spinlock with PV support
v12->v13:
- Change patch 9 to generate separate versions of the
queue_spin_lock_slowpath functions for bare metal and PV guest. This
reduces the performance impact of the PV code on bare metal systems.
v11->v12:
- Based on PeterZ's version of the qspinlock patch
(https://lkml.org/lkml/2014/6/15/63).
- Incorporated many of the review comments from Konrad Wilk and
Paolo
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15:
- Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
qspinlock code by dynamically allocating the hash table as well
as some other performance optimization.
- Simplified the Xen PV qspinlock code as suggested by David Vrabel
<david.vrabel at citrix.com>.
- Add benchmarking data for 3.19 kernel to compare the performance
of a spinlock heavy test
2015 Apr 07
18
[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support
v14->v15:
- Incorporate PeterZ's v15 qspinlock patch and improve upon the PV
qspinlock code by dynamically allocating the hash table as well
as some other performance optimization.
- Simplified the Xen PV qspinlock code as suggested by David Vrabel
<david.vrabel at citrix.com>.
- Add benchmarking data for 3.19 kernel to compare the performance
of a spinlock heavy test