Displaying 20 results from an estimated 580 matches for "contended".
2014 Mar 12
0
[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code. The following table shows the
execution time (in ms) of a micro-benchmark where 5M iterations of
the lock/unlock cycles were run on a 10-core Westere-EX x86-64 CPU
with 2 different types loads - standalone (lock and
2014 Feb 26
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code path. The following table shows
the execution time (in ms) of a micro-benchmark where 5M iterations
of the lock/unlock cycles were run on a 10-core Westere-EX CPU with
2 different types loads - standalone (lock and protected
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
A major problem with the queue spinlock patch is its performance at
low contention level (2-4 contending tasks) where it is slower than
the corresponding ticket spinlock code path. The following table shows
the execution time (in ms) of a micro-benchmark where 5M iterations
of the lock/unlock cycles were run on a 10-core Westere-EX CPU with
2 different types loads - standalone (lock and protected
2014 Feb 28
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/28/2014 04:29 AM, Peter Zijlstra wrote:
> On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
>>>> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
>>>> +
>>>> + if (old == 0) {
>>>> + /*
>>>> + * Got the lock, can clear the waiting bit now
>>>> + */
>>>> +
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an
2014 Jun 16
4
[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock
...nel/locking/mcs_spinlock.h
> +++ linux-2.6/kernel/locking/mcs_spinlock.h
> @@ -17,6 +17,7 @@
> struct mcs_spinlock {
> struct mcs_spinlock *next;
> int locked; /* 1 if lock acquired */
> + int count;
This could use a comment.
> };
>
> #ifndef arch_mcs_spin_lock_contended
> Index: linux-2.6/kernel/locking/qspinlock.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/locking/qspinlock.c
> @@ -0,0 +1,197 @@
> +/*
> + * Queue spinlock
> + *
> + * This program is free software; you can...
2014 Jun 16
4
[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock
...nel/locking/mcs_spinlock.h
> +++ linux-2.6/kernel/locking/mcs_spinlock.h
> @@ -17,6 +17,7 @@
> struct mcs_spinlock {
> struct mcs_spinlock *next;
> int locked; /* 1 if lock acquired */
> + int count;
This could use a comment.
> };
>
> #ifndef arch_mcs_spin_lock_contended
> Index: linux-2.6/kernel/locking/qspinlock.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/locking/qspinlock.c
> @@ -0,0 +1,197 @@
> +/*
> + * Queue spinlock
> + *
> + * This program is free software; you can...
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26/2014 11:20 AM, Peter Zijlstra wrote:
> You don't happen to have a proper state diagram for this thing do you?
>
> I suppose I'm going to have to make one; this is all getting a bit
> unwieldy, and those xchg() + fixup things are hard to read.
I don't have a state diagram on hand, but I will add more comments to
describe the 4 possible cases and how to handle
2014 Feb 28
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Feb 28, 2014 1:30 AM, "Peter Zijlstra" <peterz at infradead.org> wrote:
>
> At low contention the cmpxchg won't have to be retried (much) so using
> it won't be a problem and you get to have arbitrary atomic ops.
Peter, the difference between an atomic op and *no* atomic op is huge.
And Waiman posted numbers for the optimization. Why do you argue with
2014 Mar 02
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26, Waiman Long wrote:
>
> @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock)
> int qlcode = atomic_read(lock->qlcode);
>
> if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode,
> - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode))
> + qlcode, code|_QSPINLOCK_LOCKED) == qlcode))
Hmm.
2014 Mar 04
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Updated version, this includes numbers for my SNB desktop and Waiman's
variant.
Curiously Waiman's version seems consistently slower on 2 cross node
CPUs. Whereas my version seems to have a problem on SNB with 2 CPUs.
There's something weird with the ticket lock numbers; when I compile
the code with:
gcc (Debian 4.7.2-5) 4.7.2
I get the first set; when I compile with:
gcc
2014 Mar 04
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Peter,
I was trying to implement the generic queue code exchange code using
cmpxchg as suggested by you. However, when I gathered the performance
data, the code performed worse than I expected at a higher contention
level. Below were the execution time of the benchmark tool that I sent
you:
[xchg] [cmpxchg]
# of tasks Ticket lock Queue lock Queue Lock
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote:
> Peter,
>
> I was trying to implement the generic queue code exchange code using
> cmpxchg as suggested by you. However, when I gathered the performance
> data, the code performed worse than I expected at a higher contention
> level. Below were the execution time of the benchmark tool that I sent
> you:
>
>
2014 Mar 13
0
[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks
On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long wrote:
> On 03/12/2014 02:54 PM, Waiman Long wrote:
> >+ /*
> >+ * Set the lock bit& clear the waiting bit simultaneously
> >+ * It is assumed that there is no lock stealing with this
> >+ * quick path active.
> >+ *
> >+ * A direct memory store of _QSPINLOCK_LOCKED into the
> >+ *
2014 Mar 02
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On 02/26, Waiman Long wrote:
>
> @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct qspinlock *lock)
> int qlcode = atomic_read(lock->qlcode);
>
> if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode,
> - qlcode, qlcode|_QSPINLOCK_LOCKED) == qlcode))
> + qlcode, code|_QSPINLOCK_LOCKED) == qlcode))
Hmm.
2014 Mar 04
1
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
On Tue, Mar 04, 2014 at 12:48:26PM -0500, Waiman Long wrote:
> Peter,
>
> I was trying to implement the generic queue code exchange code using
> cmpxchg as suggested by you. However, when I gathered the performance
> data, the code performed worse than I expected at a higher contention
> level. Below were the execution time of the benchmark tool that I sent
> you:
>
>
2007 Feb 27
2
Creating a contended section of bandwidth with HTB and IMQ
Hi All,
I''m trying to create a contended section of bandwidth using IMQ. I have the
imq0 device up and running, with traffic passing through it.
Firstly, I need to throttle the entire device imq0 to 2mbit/s.
I would then like to add throttle rules for individual IP addresses, allowing
them to pass up to 512kbit/s each, as long as imq0...
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...360.083940
2 - nodes: 2 - nodes:
2: 1509.193824 2: 1209.090219
4: 48154.495998 4: 48547.242379
8: 137946.787244 8: 141381.498125
---
There a few curious facts I found (assuming my test code is sane).
- Intel seems to be an order of magnitude faster on uncontended LOCKed
ops compared to AMD
- On Intel the uncontended qspinlock fast path (cmpxchg) seems slower
than the uncontended ticket xadd -- although both are plenty fast
when compared to AMD.
- In general, replacing cmpxchg loops with unconditional atomic ops
doesn't seem to matter a w...
2014 Mar 03
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...360.083940
2 - nodes: 2 - nodes:
2: 1509.193824 2: 1209.090219
4: 48154.495998 4: 48547.242379
8: 137946.787244 8: 141381.498125
---
There a few curious facts I found (assuming my test code is sane).
- Intel seems to be an order of magnitude faster on uncontended LOCKed
ops compared to AMD
- On Intel the uncontended qspinlock fast path (cmpxchg) seems slower
than the uncontended ticket xadd -- although both are plenty fast
when compared to AMD.
- In general, replacing cmpxchg loops with unconditional atomic ops
doesn't seem to matter a w...