thr3ads.net - search: "smp_u8_store

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

2

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

..._QSPINLOCK_LOCKED) && > + (qsval != _QSPINLOCK_WAITING))) > + return 0; > + > + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > + > + if (old == 0) { > + /* > + * Got the lock, can clear the waiting bit now > + */ > + smp_u8_store_release(&qlock->wait, 0); So we just did an atomic op, and now you're trying to optimize this write. Why do you need a whole byte for that? Surely a cmpxchg loop with the right atomic op can't be _that_ much slower? Its far more readable and likely avoids that steal fail below as well. &...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

2

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

..._QSPINLOCK_LOCKED) && > + (qsval != _QSPINLOCK_WAITING))) > + return 0; > + > + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > + > + if (old == 0) { > + /* > + * Got the lock, can clear the waiting bit now > + */ > + smp_u8_store_release(&qlock->wait, 0); So we just did an atomic op, and now you're trying to optimize this write. Why do you need a whole byte for that? Surely a cmpxchg loop with the right atomic op can't be _that_ much slower? Its far more readable and likely avoids that steal fail below as well. &...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 26

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...215 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index 44cefee..98db42e 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -7,12 +7,30 @@ #define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS +#define smp_u8_store_release(p, v) \ +do { \ + barrier(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +/* + * As the qcode will be accessed as a 16-bit word, no offset is needed + */ +#define _QCODE_VAL_OFFSET 0 + /* * x86-64 specific queue spinlock union structure + * Besides the slock and lock fields, the other fie...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...215 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index 44cefee..98db42e 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -7,12 +7,30 @@ #define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS +#define smp_u8_store_release(p, v) \ +do { \ + barrier(); \ + ACCESS_ONCE(*p) = (v); \ +} while (0) + +/* + * As the qcode will be accessed as a 16-bit word, no offset is needed + */ +#define _QCODE_VAL_OFFSET 0 + /* * x86-64 specific queue spinlock union structure + * Besides the slock and lock fields, the other fie...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 27

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...+ (qsval != _QSPINLOCK_WAITING))) >> + return 0; >> + >> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); >> + >> + if (old == 0) { >> + /* >> + * Got the lock, can clear the waiting bit now >> + */ >> + smp_u8_store_release(&qlock->wait, 0); > > So we just did an atomic op, and now you're trying to optimize this > write. Why do you need a whole byte for that? > > Surely a cmpxchg loop with the right atomic op can't be _that_ much > slower? Its far more readable and likely avoids that s...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

0

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...iman Long wrote: >>>> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); >>>> + >>>> + if (old == 0) { >>>> + /* >>>> + * Got the lock, can clear the waiting bit now >>>> + */ >>>> + smp_u8_store_release(&qlock->wait, 0); >>> So we just did an atomic op, and now you're trying to optimize this >>> write. Why do you need a whole byte for that? >>> >>> Surely a cmpxchg loop with the right atomic op can't be _that_ much >>> slower? Its far mor...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an atomic op, and now you're trying to optimize this > >write. Why do you need a whole byte for that? > > > >Surely a cmpxchg loop with the right atomic op can't be _that_ much > >slower? Its far more readabl...

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014 Feb 28

5

[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

...2014 at 03:42:19PM -0500, Waiman Long wrote: > >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED); > >>+ > >>+ if (old == 0) { > >>+ /* > >>+ * Got the lock, can clear the waiting bit now > >>+ */ > >>+ smp_u8_store_release(&qlock->wait, 0); > > > >So we just did an atomic op, and now you're trying to optimize this > >write. Why do you need a whole byte for that? > > > >Surely a cmpxchg loop with the right atomic op can't be _that_ much > >slower? Its far more readabl...

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 27

14

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 26

22

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

2014 Feb 26

22

[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support

v4->v5: - Move the optimized 2-task contending code to the generic file to enable more architectures to use it without code duplication. - Address some of the style-related comments by PeterZ. - Allow the use of unfair queue spinlock in a real para-virtualized execution environment. - Add para-virtualization support to the qspinlock code by ensuring that the lock holder and queue

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 26

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...* + * 2) Byte and short data exchange and compare-exchange instructions * + * * + * For those architectures, their asm/qspinlock.h header file should * + * define the followings in order to use the optimized codes. * + * 1) The _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS macro * + * 2) A smp_u8_store_release() macro for byte size store operation * + * 3) A "union arch_qspinlock" structure that include the individual * + * fields of the qspinlock structure, including: * + * o slock - the qspinlock structure * + * o lock - the lock byte * + * * + **************...

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014 Feb 27

0

[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

...* + * 2) Byte and short data exchange and compare-exchange instructions * + * * + * For those architectures, their asm/qspinlock.h header file should * + * define the followings in order to use the optimized codes. * + * 1) The _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS macro * + * 2) A smp_u8_store_release() macro for byte size store operation * + * 3) A "union arch_qspinlock" structure that include the individual * + * fields of the qspinlock structure, including: * + * o slock - the qspinlock structure * + * o lock - the lock byte * + * * + **************...

search for: smp_u8_store_release