Displaying 4 results from an estimated 4 matches for "xchg8".
Did you mean:
xchg
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...ocks# LOCK=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh
4: 50783.509653
8: 146295.875715
16: 332942.964709
4: 51033.341441
8: 146320.656285
16: 332586.355194
And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg
loops with unconditional ops (xchg8 and xchg16).
And I'd think that 4 CPUs x 4 Nodes would be heavy contention.
I'll have another poke tomorrow; including verifying asm tomorrow, need
to go sleep now.
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...ocks# LOCK=./qspinlock-pending-opt ./test-4.sh ; LOCK=./qspinlock-pending-opt2 ./test-4.sh
4: 50783.509653
8: 146295.875715
16: 332942.964709
4: 51033.341441
8: 146320.656285
16: 332586.355194
And the difference between opt and opt2 is that opt2 replaces 2 cmpxchg
loops with unconditional ops (xchg8 and xchg16).
And I'd think that 4 CPUs x 4 Nodes would be heavy contention.
I'll have another poke tomorrow; including verifying asm tomorrow, need
to go sleep now.
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local:
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local: