Displaying 20 results from an estimated 580 matches for "contenders".
2014 Mar 12
0
[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code. The following table shows the
execution time (in ms) of a micro-benchmark where 5M iterations of
the lock/unlock cycles were run on a 10-core Westere-EX x86-64 CPU
with 2 different types loads - standalone (lock and
2014 Feb 26
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code path. The following table shows
the execution time (in ms) of a micro-benchmark where 5M iterations
of the lock/unlock cycles were run on a 10-core Westere-EX CPU with
2 different types loads - standalone (lock and protected
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code path. The following table shows
the execution time (in ms) of a micro-benchmark where 5M iterations
of the lock/unlock cycles were run on a 10-core Westere-EX CPU with
2 different types loads - standalone (lock and protected
2014 Feb 28
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/28/2014 04:29 AM, Peter Zijlstra wrote:
> On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
>>>> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
>>>> +
>>>> + if (old == 0) {
>>>> + /*
>>>> + * Got the lock, can clear the waiting bit now
>>>> + */
>>>> +
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an
2014 Jun 16
4
[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock
On Sun, Jun 15, 2014 at 02:46:58PM +0200, Peter Zijlstra wrote:
> From: Waiman Long <Waiman.Long at hp.com>
>
> This patch introduces a new generic queue spinlock implementation that
> can serve as an alternative to the default ticket spinlock. Compared
> with the ticket spinlock, this queue spinlock should be almost as fair
> as the ticket spinlock. It has about the same
2014 Jun 16
4
[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock
On Sun, Jun 15, 2014 at 02:46:58PM +0200, Peter Zijlstra wrote:
> From: Waiman Long <Waiman.Long at hp.com>
>
> This patch introduces a new generic queue spinlock implementation that
> can serve as an alternative to the default ticket spinlock. Compared
> with the ticket spinlock, this queue spinlock should be almost as fair
> as the ticket spinlock. It has about the same
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26/2014 11:20 AM, Peter Zijlstra wrote:
> You don't happen to have a proper state diagram for this thing do you?
>
> I suppose I'm going to have to make one; this is all getting a bit
> unwieldy, and those xchg() + fixup things are hard to read.
I don't have a state diagram on hand, but I will add more comments to
describe the 4 possible cases and how to handle
2014 Feb 28
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Feb 28, 2014 1:30 AM, "Peter Zijlstra" <peterz at infradead.org> wrote:
>
> At low contention the cmpxchg won't have to be retried (much) so using
> it won't be a problem and you get to have arbitrary atomic ops.
Peter, the difference between an atomic op and *no* atomic op is huge.
And Waiman posted numbers for the optimization. Why do you argue with
2014 Mar 02
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26, Waiman Long wrote:
>
> @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock)
> int qlcode = atomic_read(lock->qlcode);
>
> if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode,
> - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode))
> + qlcode, code|_QSPINLOCK_LOCKED) == qlcode))
Hmm.
2014 Mar 04
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Updated version, this includes numbers for my SNB desktop and Waiman's
variant.
Curiously Waiman's version seems consistently slower on 2 cross node
CPUs. Whereas my version seems to have a problem on SNB with 2 CPUs.
There's something weird with the ticket lock numbers; when I compile
the code with:
gcc (Debian 4.7.2-5) 4.7.2
I get the first set; when I compile with:
gcc
2014 Mar 04
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Peter,
I was trying to implement the generic queue code exchange code using
cmpxchg as suggested by you. However, when I gathered the performance
data, the code performed worse than I expected at a higher contention
level. Below were the execution time of the benchmark tool that I sent
you:
[xchg] [cmpxchg]
# of tasks Ticket lock Queue lock Queue Lock
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote:
> Peter,
>
> I was trying to implement the generic queue code exchange code using
> cmpxchg as suggested by you. However, when I gathered the performance
> data, the code performed worse than I expected at a higher contention
> level. Below were the execution time of the benchmark tool that I sent
> you:
>
>
2014 Mar 13
0
[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks
On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long wrote:
> On 03/12/2014 02:54 PM, Waiman Long wrote:
> >+ /*
> >+ * Set the lock bit& clear the waiting bit simultaneously
> >+ * It is assumed that there is no lock stealing with this
> >+ * quick path active.
> >+ *
> >+ * A direct memory store of _QSPINLOCK_LOCKED into the
> >+ *
2014 Mar 02
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26, Waiman Long wrote:
>
> @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock)
> int qlcode = atomic_read(lock->qlcode);
>
> if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode,
> - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode))
> + qlcode, code|_QSPINLOCK_LOCKED) == qlcode))
Hmm.
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote:
> Peter,
>
> I was trying to implement the generic queue code exchange code using
> cmpxchg as suggested by you. However, when I gathered the performance
> data, the code performed worse than I expected at a higher contention
> level. Below were the execution time of the benchmark tool that I sent
> you:
>
>
2007 Feb 27
2
Creating a contended section of bandwidth with HTB and IMQ
Hi All,
I''m trying to create a contended section of bandwidth using IMQ. I have the
imq0 device up and running, with traffic passing through it.
Firstly, I need to throttle the entire device imq0 to 2mbit/s.
I would then like to add throttle rules for individual IP addresses, allowing
them to pass up to 512kbit/s each, as long as imq0 has not reached its
2mbit/s.
The configuration
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local:
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Hi,
Here are some numbers for my version -- also attached is the test code.
I found that booting big machines is tediously slow so I lifted the
whole lot to userspace.
I measure the cycles spend in arch_spin_lock() + arch_spin_unlock().
The machines used are a 4 node (2 socket) AMD Interlagos, and a 2 node
(2 socket) Intel Westmere-EP.
AMD (ticket) AMD (qspinlock + pending + opt)
Local: