Displaying 14 results from an estimated 14 matches for "smp_u8_store_release".
2014 Feb 26
2
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
..._QSPINLOCK_LOCKED) &&
> + (qsval != _QSPINLOCK_WAITING)))
> + return 0;
> +
> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> +
> + if (old == 0) {
> + /*
> + * Got the lock, can clear the waiting bit now
> + */
> + smp_u8_store_release(&qlock->wait, 0);
So we just did an atomic op, and now you're trying to optimize this
write. Why do you need a whole byte for that?
Surely a cmpxchg loop with the right atomic op can't be _that_ much
slower? Its far more readable and likely avoids that steal fail below as
well.
&...
2014 Feb 26
2
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
..._QSPINLOCK_LOCKED) &&
> + (qsval != _QSPINLOCK_WAITING)))
> + return 0;
> +
> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> +
> + if (old == 0) {
> + /*
> + * Got the lock, can clear the waiting bit now
> + */
> + smp_u8_store_release(&qlock->wait, 0);
So we just did an atomic op, and now you're trying to optimize this
write. Why do you need a whole byte for that?
Surely a cmpxchg loop with the right atomic op can't be _that_ much
slower? Its far more readable and likely avoids that steal fail below as
well.
&...
2014 Feb 26
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...215 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 44cefee..98db42e 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -7,12 +7,30 @@
#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS
+#define smp_u8_store_release(p, v) \
+do { \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+/*
+ * As the qcode will be accessed as a 16-bit word, no offset is needed
+ */
+#define _QCODE_VAL_OFFSET 0
+
/*
* x86-64 specific queue spinlock union structure
+ * Besides the slock and lock fields, the other fie...
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...215 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 44cefee..98db42e 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -7,12 +7,30 @@
#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS
+#define smp_u8_store_release(p, v) \
+do { \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+/*
+ * As the qcode will be accessed as a 16-bit word, no offset is needed
+ */
+#define _QCODE_VAL_OFFSET 0
+
/*
* x86-64 specific queue spinlock union structure
+ * Besides the slock and lock fields, the other fie...
2014 Feb 27
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...+ (qsval != _QSPINLOCK_WAITING)))
>> + return 0;
>> +
>> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
>> +
>> + if (old == 0) {
>> + /*
>> + * Got the lock, can clear the waiting bit now
>> + */
>> + smp_u8_store_release(&qlock->wait, 0);
>
> So we just did an atomic op, and now you're trying to optimize this
> write. Why do you need a whole byte for that?
>
> Surely a cmpxchg loop with the right atomic op can't be _that_ much
> slower? Its far more readable and likely avoids that s...
2014 Feb 28
0
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...iman Long wrote:
>>>> + old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
>>>> +
>>>> + if (old == 0) {
>>>> + /*
>>>> + * Got the lock, can clear the waiting bit now
>>>> + */
>>>> + smp_u8_store_release(&qlock->wait, 0);
>>> So we just did an atomic op, and now you're trying to optimize this
>>> write. Why do you need a whole byte for that?
>>>
>>> Surely a cmpxchg loop with the right atomic op can't be _that_ much
>>> slower? Its far mor...
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an atomic op, and now you're trying to optimize this
> >write. Why do you need a whole byte for that?
> >
> >Surely a cmpxchg loop with the right atomic op can't be _that_ much
> >slower? Its far more readabl...
2014 Feb 28
5
[PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
...2014 at 03:42:19PM -0500, Waiman Long wrote:
> >>+ old = xchg(&qlock->lock_wait, _QSPINLOCK_WAITING|_QSPINLOCK_LOCKED);
> >>+
> >>+ if (old == 0) {
> >>+ /*
> >>+ * Got the lock, can clear the waiting bit now
> >>+ */
> >>+ smp_u8_store_release(&qlock->wait, 0);
> >
> >So we just did an atomic op, and now you're trying to optimize this
> >write. Why do you need a whole byte for that?
> >
> >Surely a cmpxchg loop with the right atomic op can't be _that_ much
> >slower? Its far more readabl...
2014 Feb 27
14
[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support
v4->v5:
- Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
- Address some of the style-related comments by PeterZ.
- Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
- Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue
2014 Feb 27
14
[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support
v4->v5:
- Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
- Address some of the style-related comments by PeterZ.
- Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
- Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue
2014 Feb 26
22
[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support
v4->v5:
- Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
- Address some of the style-related comments by PeterZ.
- Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
- Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue
2014 Feb 26
22
[PATCH v5 0/8] qspinlock: a 4-byte queue spinlock with PV support
v4->v5:
- Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
- Address some of the style-related comments by PeterZ.
- Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
- Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue
2014 Feb 26
0
[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation
...*
+ * 2) Byte and short data exchange and compare-exchange instructions *
+ * *
+ * For those architectures, their asm/qspinlock.h header file should *
+ * define the followings in order to use the optimized codes. *
+ * 1) The _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS macro *
+ * 2) A smp_u8_store_release() macro for byte size store operation *
+ * 3) A "union arch_qspinlock" structure that include the individual *
+ * fields of the qspinlock structure, including: *
+ * o slock - the qspinlock structure *
+ * o lock - the lock byte *
+ * *
+ **************...
2014 Feb 27
0
[PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation
...*
+ * 2) Byte and short data exchange and compare-exchange instructions *
+ * *
+ * For those architectures, their asm/qspinlock.h header file should *
+ * define the followings in order to use the optimized codes. *
+ * 1) The _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS macro *
+ * 2) A smp_u8_store_release() macro for byte size store operation *
+ * 3) A "union arch_qspinlock" structure that include the individual *
+ * fields of the qspinlock structure, including: *
+ * o slock - the qspinlock structure *
+ * o lock - the lock byte *
+ * *
+ **************...