thr3ads.net - search: "uncontended"

[PATCH v10 03/19] qspinlock: Add pending bit

2014 May 07

0

[PATCH v10 03/19] qspinlock: Add pending bit

...structure * @val: Current value of the queue spinlock 32-bit word * - * (queue tail, lock bit) + * (queue tail, pending bit, lock bit) + * + * fast : slow : unlock + * : : + * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) + * : | ^--------.------. / : + * : v \ \ | : + * pending : (0,1,1) +--> (0,1,0) \ | : + *...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 15

0

[PATCH 03/11] qspinlock: Add pending bit

...k * @lock: Pointer to queue spinlock structure * @val: Current value of the queue spinlock 32-bit word * - * (queue tail, lock bit) - * - * fast : slow : unlock - * : : - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) - * : | ^--------. / : - * : v \ | : - * uncontended : (n,x) --+--> (n,0) | : - * que...

[PATCH v9 03/19] qspinlock: Add pending bit

2014 Apr 17

0

[PATCH v9 03/19] qspinlock: Add pending bit

...tructure * @val: Current value of the queue spinlock 32-bit word * - * (queue tail, lock bit) + * (queue tail, pending bit, lock bit) * - * fast : slow : unlock - * : : - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) - * : | ^--------. / : - * : v \ | : - * uncontended : (n,x) --+--> (n,0) | : - * que...

[PATCH 3/9] qspinlock: Add pending bit

2015 Mar 16

0

[PATCH 3/9] qspinlock: Add pending bit

...* @lock: Pointer to queue spinlock structure * @val: Current value of the queue spinlock 32-bit word * - * (queue tail, lock value) - * - * fast : slow : unlock - * : : - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) - * : | ^--------. / : - * : v \ | : - * uncontended : (n,x) --+--> (n,0) | : - * que...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...86360.083940 2 - nodes: 2 - nodes: 2: 1509.193824 2: 1209.090219 4: 48154.495998 4: 48547.242379 8: 137946.787244 8: 141381.498125 --- There a few curious facts I found (assuming my test code is sane). - Intel seems to be an order of magnitude faster on uncontended LOCKed ops compared to AMD - On Intel the uncontended qspinlock fast path (cmpxchg) seems slower than the uncontended ticket xadd -- although both are plenty fast when compared to AMD. - In general, replacing cmpxchg loops with unconditional atomic ops doesn't seem to matter a w...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Mar 03

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...86360.083940 2 - nodes: 2 - nodes: 2: 1509.193824 2: 1209.090219 4: 48154.495998 4: 48547.242379 8: 137946.787244 8: 141381.498125 --- There a few curious facts I found (assuming my test code is sane). - Intel seems to be an order of magnitude faster on uncontended LOCKed ops compared to AMD - On Intel the uncontended qspinlock fast path (cmpxchg) seems slower than the uncontended ticket xadd -- although both are plenty fast when compared to AMD. - In general, replacing cmpxchg loops with unconditional atomic ops doesn't seem to matter a w...

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014 Jun 16

4

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

...ept it is not a lock bit. It is a lock uint8_t. Is the queue tail at this point the composite of 'cpu|idx'? > + * > + * fast : slow : unlock > + * : : > + * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) > + * : | ^--------. / : > + * : v \ | : > + * uncontended : (n,x) --+--> (n,0)...

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014 Jun 16

4

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

...ept it is not a lock bit. It is a lock uint8_t. Is the queue tail at this point the composite of 'cpu|idx'? > + * > + * fast : slow : unlock > + * : : > + * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) > + * : | ^--------. / : > + * : v \ | : > + * uncontended : (n,x) --+--> (n,0)...

[RFC 08/07] qspinlock: integrate pending bit into queue

2014 May 21

0

[RFC 08/07] qspinlock: integrate pending bit into queue

...nlock * @lock: Pointer to queue spinlock structure @@ -324,21 +381,21 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) * fast : slow : unlock * : : * uncontended (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0) - * : | ^--------.------. / : - * : v \ \ | : - * pending : (0,1,1) +--> (0,1,0) \ | : - *...

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014 Jun 11

3

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: > Enabling this configuration feature causes a slight decrease the > performance of an uncontended lock-unlock operation by about 1-2% > mainly due to the use of a static key. However, uncontended lock-unlock > operation are really just a tiny percentage of a real workload. So > there should no noticeable change in application performance. No, entirely unacceptable. > +#ifdef CONFI...

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014 Jun 11

3

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: > Enabling this configuration feature causes a slight decrease the > performance of an uncontended lock-unlock operation by about 1-2% > mainly due to the use of a static key. However, uncontended lock-unlock > operation are really just a tiny percentage of a real workload. So > there should no noticeable change in application performance. No, entirely unacceptable. > +#ifdef CONFI...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

5

[PATCH 03/11] qspinlock: Add pending bit

...ock structure > * @val: Current value of the queue spinlock 32-bit word > * > - * (queue tail, lock bit) > - * > - * fast : slow : unlock > - * : : > - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) > - * : | ^--------. / : > - * : v \ | : > - * uncontended : (n,x) --+--> (n,0)...

[PATCH 03/11] qspinlock: Add pending bit

2014 Jun 17

5

[PATCH 03/11] qspinlock: Add pending bit

...ock structure > * @val: Current value of the queue spinlock 32-bit word > * > - * (queue tail, lock bit) > - * > - * fast : slow : unlock > - * : : > - * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) > - * : | ^--------. / : > - * : v \ | : > - * uncontended : (n,x) --+--> (n,0)...

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

2014 Jun 23

0

[PATCH 01/11] qspinlock: A simple generic 4-byte queue spinlock

...ue tail at this point the composite of 'cpu|idx'? Yes, as per {en,de}code_tail() above. > > + * > > + * fast : slow : unlock > > + * : : > > + * uncontended (0,0) --:--> (0,1) --------------------------------:--> (*,0) > > + * : | ^--------. / : > > + * : v \ | : > > + * uncontended : (n,x) --+--> (n,0)...

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014 Jun 12

2

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: > > On 6/11/2014 6:54 AM, Peter Zijlstra wrote: > >On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: > >>Enabling this configuration feature causes a slight decrease the > >>performance of an uncontended lock-unlock operation by about 1-2% > >>mainly due to the use of a static key. However, uncontended lock-unlock > >>operation are really just a tiny percentage of a real workload. So > >>there should no noticeable change in application performance. > >No, entirely u...

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014 Jun 12

2

[PATCH v11 09/16] qspinlock, x86: Allow unfair spinlock in a virtual guest

On Wed, Jun 11, 2014 at 09:37:55PM -0400, Long, Wai Man wrote: > > On 6/11/2014 6:54 AM, Peter Zijlstra wrote: > >On Fri, May 30, 2014 at 11:43:55AM -0400, Waiman Long wrote: > >>Enabling this configuration feature causes a slight decrease the > >>performance of an uncontended lock-unlock operation by about 1-2% > >>mainly due to the use of a static key. However, uncontended lock-unlock > >>operation are really just a tiny percentage of a real workload. So > >>there should no noticeable change in application performance. > >No, entirely u...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 18

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

...ion with a > single cmpxchg: > > - * 0,0,0 -> 0,0,1 ; trylock > - * p,y,x -> n,y,x ; prev = xchg(lock, node) > > to first doing the trylock, then the xchg. If the trylock passes and the > xchg returns prev=0,0,0, the next step of the algorithm goes to the > locked/uncontended state > > + /* > + * claim the lock: > + * > + * n,0 -> 0,1 : lock, uncontended > > Similar to your suggestion of patch 3, it's expected that the xchg will > *not* return prev=0,0,0 after a failed trylock. I do like your explanation. I hope that Peter will put i...

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

2014 Jun 18

3

[PATCH 04/11] qspinlock: Extract out the exchange of tail code word

...ion with a > single cmpxchg: > > - * 0,0,0 -> 0,0,1 ; trylock > - * p,y,x -> n,y,x ; prev = xchg(lock, node) > > to first doing the trylock, then the xchg. If the trylock passes and the > xchg returns prev=0,0,0, the next step of the algorithm goes to the > locked/uncontended state > > + /* > + * claim the lock: > + * > + * n,0 -> 0,1 : lock, uncontended > > Similar to your suggestion of patch 3, it's expected that the xchg will > *not* return prev=0,0,0 after a failed trylock. I do like your explanation. I hope that Peter will put i...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

On Thu, Feb 27, 2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an

search for: uncontended